This Page Is Inserted by IFW Operations 
and is not a part of the Official Record 

BEST AVAILABLE IMAGES 

Defective images within this document are accurate representations of 
the original documents submitted by the applicant. 

Defects in the images may include (but are not limited to): 

• BLACK BORDERS 

• TEXT CUT OFF AT TOP, BOTTOM OR SIDES 

• FADED TEXT 

• ILLEGIBLE TEXT 

• SKEWED/SLANTED IMAGES 

• COLORED PHOTOS 

• BLACK OR VERY BLACK AND WHITE DARK PHOTOS 

• GRAY SCALE DOCUMENTS 



IMAGES ARE BEST AVAILABLE COPY. 



As rescanning documents will not correct images, 
please do not report the images to the 
Image Problem Mailbox. 



THIS PAGE BLANK cmo) 



WORLD INTELLECTUAL PROPERTY ORGANIZATION 
Intemational Bureau 




PCX 

INTERNATIONAL APPLICATION PUBUSHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(51) International Patent Classification ^ : 
C12Q V6S, C07H 21/D4 



Al 



(11) International Publication Number: 
(43) International Publication Date: 



WO 97/10365 

20 March 1997 (20.03.97) 



(21) Intemational AppUcatlon Number: PCT/US96/I4839 

(22) International Filing Date: 13 September 1996 (13.09.96) 



(30) Priority Data: 
087529.115 



15 September 1995 (15.09.95) US 



(71) Applicant (for all designated States except US): AFFYMAX 

TECHNOLOGIES N.V. IN17NLJ; Dc Ruydcrkade 62, 
Cura9ao (AN). 

(72) Inventors; and 

(75) Inventors/Applicants (for US only): LXXKHART. David. J. 
[US/USl; 610 Mountain View Avenue. Mountain View, CA 
94041 (US). BROWN, Eugene. L. [US/US]; 1388 Walnut 
Street, Newton Highlands. MA 02161 (US). WONG, Gor- 
don (US/US]; 239 Claik Road. Brooklinc. MA 02146 (US). 
CHEE. Marie (AU/US]; 3199 Wavcrly Street. Palo Alto. CA 
94306 (US). GINGERAS. Thomas, R. [US/ US]; 5 28 Ju- 
niper Hill Drive. Encinltas, CA 92021 (US). MTTTMANN. 
Michael. P. [US/USJ; 2377 SL Francis Drive, Palo Alto. 
CA 94303 (US). LIPSHUTZ, Robert. J. [USAJS]; 970 Palo 
Alto Avenue. Palo Alto. CA 94301 (US). FODOR. Stephen, 
p.. A. [US/USl; 3863 Nathan Way, Palo Alto. CA 94303 
(US). WANG. Chunwei 



(74) Agents: HUNTER, Tom et al.; Townsend and Townsend 
and Qew L.L.P., 8th floor. Two Embarcadero Center, San 
Francisco. CA 941 1 1-3834 (US). 



(81) Designated States: AU, CA. JP, US, European patent (AT. BE, 
CH. DE, DK, ES, H. FR. GB, GR. IE, IT. LU. MC, NL, 
PT, SE), 



Published 

With international search report. 

Before the expiration of the time limit for amending the 
claims and to be republished in the event of the receipt of 
amendments. 



(54) TItie: EXPRESSION MONITORINO BY HYBRIDIZATION TO HIGH DENSITY OLIGONUCLEOTIDE ARRAYS 



cats 




a-im 



■■■■■■■a 



(HI 



(57) Abstract 

This invention provides methods of monitoring die expression levels of a multiplicity of genes. The methods involve hybridizing 
a nucleic acid sample to a high density anay of oligonucleotide probes where the high density anay contains oligonucleotide probes 
complementaiy to subsequences of target nucleic acids in the nucleic acid sample. In one embodiment, the method involves providing a 
pool of target nucleic acids comprising RNA nanscripts of one or more target genes, or nucleic acids derived from the RNA transcripts, 
hybridizing said pool of nucleic acids to an array of oligonucleotide probes immobilized on surface, where dse array comprising more than 
100 different oligonucleotides and each different oligonucleotide is localized in a predetermined region of the surface, the density of the 
different oligonucleotides is greater than about 60 differmt oligonucleotides per 1 cm^ and the oligonucleotide probes are complementary 
to the RNA transcripts or nucleic acids derived from the RNA transcripts; and quantifying the hybridized nucleic acids in the anay. 
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EXPRESSION MONITORING BY HYBRIDIZATION TO HIGH 
DENSITY OLIGONUCLEOTIDE ARRAYS 
CROSS REFERENCE TO RELATED APPLICATIONS 

This is a continuation-in-part of U.S.S.N. 08/529 J IS filed on September 
IS, 199S which is herein incorporated by reference for all purposes. 

BACKGROUND OF THE INVENTION 
A portion of the disclosure of this patent document contains material 
which subject to copyright protection. The copyright owner has no objection to the 
xerographic reproduction by anyone of the patent document or the patent disclosure in 
exactly the form it appears in the Patent and Trademark Office patent file or records, but 
otherwise reserves all copyright rights whatsoever. 

Many disease states are characterized by differences in the expression 
levels of various genes either through changes in the copy number of the genetic DNA 
or through changes in levels of transcription (e.g. through control of initiation, provision 
of RNA precursors, RNA processing, etc.) of particular genes. For example, losses and 
gains of genetic material play an important role in malignant transformation and 
progression. These gains and losses are thought to be "driven" by at least two kinds of 
genes. Oncogenes are positive regulators of tumorgenesis, while tumor suppressor genes 
are negative regulators of tumorgenesis (Marshall, Cell, 64: 313-326 (1991); Weinberg, 
Science, 2S4: 1 138-1 146 (1991)). Therefore, one mechanism of activating unregulated 
growth is to increase the number of genes coding for oncogene protdns or to increase 
the level of expression of these mcogenes (e.g. in response to cellular or environmental 
changes), and another is to lose genetic material or to decrease the level of expression of 
genes that code for tumor suppressors. This model is supported by the losses and gains 
of genetic material associated with glioma progression (Mikkelson et al J. Cellular 
Biochm. 46: 3-8 (1991)). Thus, changes in the expression (transcription) levels of 
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particular genes (e.g. oncogenes or tumor suppressors), serve as signposts for the 
presence and progression of various cancers. 

Similarly, control of the cell cycle and cell development, as well as 
diseases, are characterized by the variations in the transcription levels of particular 
genes. Thus, for example, a viral infection is often characterized by the elevated 
expression of genes of the particular virus. For example, outbreaks of Herpes simplex, 
^stein-Barr virus infections (e.g. infectious mononucleosis), cytomegalovinis. 
Varicella-zoster virus infections, parvovirus infections, human papillomavirus infections. 
etc. are all characterized by elevated expression of various genes present in the 
respective virus. Detection of elevated expression levels of characteristic viral genes 
provides an effective diagnostic of the disease state. In particular, viruses such as herpes 
simplex, enter quiescent states for periods of time only to erupt in brief periods of rapid 
replication. Detection of expression levels of characteristic viral genes allows detection 
of such active proliferative (and presumably infective) states. 

Oligonucleotide probes have long been used to detect complementary 
nucleic acid sequences in a nucleic acid of interest (the "target" nucleic acid) and have 
been used to detect expression of particular genes (e.g. . a Northern Blot). In some assay 
formats, the oligonucleotide probe is tethered, i.e., by covalent attachment, to a solid 
support, and arrays of oUgonucleotide probes immobilized on solid supports have been 
used to detect specific nucleic acid sequences in a target nucleic acid. See. e.g. , PCT 
patent pubUcation Nos. WO 89/10977 and 89/1 1548. Others have proposed the use of 
large numbers of oligonucleotide probes to provide the complete nucleic acid sequence 
of a target nucleic add but failed to provide an enabUng method for using arrays of 
immobilized probes for this purpose. See U.S. Patent Nos. 5,202,231 and 5,002,867 
and PCT patent publication No. WO 93/17126. 

The use of "traditional" hybridization protocols for monitoring or 
quantifying gene expression is problematic. For example two or more gene products of 
approximately the same molecular weight will prove difficult or impossible to 
distinguish in a Northern blot because they are not readily separated by electrophoretic 
methods. 
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Similarly, as hybridization efficiency and cross-reactivity varies with the particular 
subsequence (region) of a gene being probed it is difficult to obtain an accurate and 
reliable measure of gene expression with one, or even a few, probes to the target gene. 

The development of VLSIPS^ technology provided methods for 
synthesizing arrays of many different oligonucleotide probes that occupy a very small 
surface area. See U.S. Patent No. S,143»8S4 and PCT patent publication No. WO 
90/15070. U.S. Patent application Serial No. 082,937, filed June 25, 1993, describes 
methods for making arrays of oligonucleotide probes that can be used to provide the 
complete sequence of a target nucleic acid and to detect the presence of a nucleic acid 
containing a specific nucleotide sequence. 

Prior to the present invention, however, it was unknown that high density 
oligonucleotide arrays could be used to reliably monitor message levels of a multiplicity 
of preselected genes in the presence of a large abundance of other (non-target) nucleic 
acids (e.g., in a cDNA library, DNA reverse transcribed from an mRNA, mRNA used 
directly or amplified, or polymerized from a DNA template). In addition, the prior art 
provided no rapid and effective method for identifying a set of oligonucleotide probes 
that maximize specific hybridization efficacy while minimizing cross-reactivity nor of 
using hybridization patterns (in particular hybridization patterns of a multiplicity of 
oligonucleotide probes in which multiple oligonucleotide probes are directed to each 
target nucleic acid) for quantification of target nucleic acid concentrations. 

Siimmary of the Invention 
The present invention is premised, in part, on the discovery that 
microfabricated arrays of large numbers of different oligonucleotide probes (DNA chips) 
may effectively be used to not only detect the presence or absence of target nucleic acid 
sequences, but to quantify the relative abundance of the target sequences in a complex 
nucleic acid pool. In addition, it was also a surprising discovery that relatively short 
oligonucleotide probes (e.g., 20 mer) are sufficiently specific to allow quantitation of 
gene expression in complex mixtures of nucleic acids particularly when provided as in 
high density oligonucleotide probe arrays. 
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Prior to this invention it was unknown that hybridization to high density 
probe anays would pennit small variations in expression levels of a particular gene to be 
identified and quantified in a complex population of nucleic acids that out number the 
target nucleic adds by 1,000 fold to 1 ,000.000 fold or more. It was also unknown that 
the transcription levels of specific genes can be quantitated in a complex nucleic acid 

mixture with only a few (e.^., less than 20 or even less than 10) relatively short 
oligonucleotide probes. 

Thus, this invention provides for a method of simultaneously monitoring 
the expression {e.g. detecting and or quantifying the expression) of a multiplicity of 
genes. The levels of transcription for virtually any number of genes may be determined 
simultaneously. Typically, at least about 10 genes, preferably at least about 100, more 
preferably at least about 1000 and most preferably at least about 10,000 different genes 
are assayed at one time. 

The method involves providing a pool of target nucleic acids comprising 
mRNA transcripts of one or more of said genes, or nucleic acids derived from the 
mRNA transcripts; hybridizing the pool of nuclric acids to an array of oligonucleotide 
probes immobilized on a surfece, where the array comprises more than 100 different 
oligonucleotides, each different oUgonucleotide is localized in a predetermined region of 
said surfece, each different oligonucleotide is attached to the surface through a single 
covalent bond, the density of the different oUgonudeotides is greater than about 60 
different oligonucleotides (where different oligonucleotides refers to oligonucleotides 
having diffisrent sequences) per 1 cm*, and the oligonucleotide probes are 
complementary to the mRNA transcripts or nucleic adds derived from the mRNA 
tanscripts; and quantifying the hybridized nucleic acids in the anay. The method can 
additionally include a step of quantifying the hybridizarion of the target nucldc acids to 
the array. The quantification preferably provides a measure of the levels of transcription 
of the genes. In a preferred embodiment, the pool of target nuddc acids is one in which 
the concentration of the target nuddc acids (mRNA transcripts or nucldc acids derived 
from the mRNA transcripts) is proportional to the expression levds of genes encoding 
those target nucleic acids. 
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In a preferred embodiment, the array of oligonucleotide probes is a high 
density array comprising greater than about 100, preferably greater than about 1,000 
more preferably greater than about 16,000 and most preferably greater than about 
65,000 or 250,000 or even 1 ,000,000 different oligonucleotide probes. Such high 
5 density arrays comprise a probe density of generally greater than about 60, more 

generally greater than about 100, most generally greater than about 600, often greater 
than about 1000, more often greater than about 5,000, most often greater than about 
10,000, preferably greater than about 40,000 more preferably greater than about 
100,000, and most preferably greater than about 400,000 different oligonucleotide 

10 probes per cm^ (where different oligonucleotides refers to oligonucleotides having 
different sequences). The oligonucleotide probes range from about 5 to about SO 
nucleotides, preferably from about 5 to about 45 nucleotides, still more preferably from 
about 10 to about 40 nucleotides and most preferably from about 15 to about 40 
nucleotides in length. Particularly preferred arrays contain probes ranging from about 

15 20 to about 25 oligonucleotides in length. The array may comprise more than 10, 

preferably more than 50, more preferably more than 100, and most preferably more than 
1000 oligonucleotide probes specific for each target gene. In a preferred embodiment, 
the array comprises at least 10 different oligonucleotide probes for each gene. In 
anoUier preferred embodiment, the array 20 or fewer oligonucleotides complementary 

20 each gene. Although a planar array surface is preferred, the array may be fabricated on a 
surface of virtually any shape or even a multiplicity of surfaces. 

The array may further comprise mismatch control probes. Where such 
mismatch controls are present, the quantifying step may comprise calculating the 
difference in hybridization signal intensity between each of the oligonucleotide probes 

25 and its corresponding mismatch control probe. The quantifying may further comprise 
calculating the average difference in hybridization signal intensity between each of the 
oligonucleotide probes and its corresponding mismatch control probe for each gene. 

The probes present in the high density array can be oligonucleotide probes 
selected according to selection and optimization methods described below. 

30 Alternatively, non-optimal probes may be included in the array, but the probes used for 
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quantification (analysis) can be selected according to the optimization methods described 
below. 

Oligonucleotide arrays for the practice of this invention are preferably 
chemically synthesized by parallel immobilized polymer synthesis methods, more 
preferably by light directed polymer synthesis methods. Chemically synthesized arrays 
are advantageous in that probe preparation does not require cloning, a nucleic acid 
amplification step, or enzymatic synthesis. Indeed, the preparation of the probes does 
not require handling of any biological materials. 

The array includes test probes which are oligonucleotide probes each of 
which has a sequence that is complementaiy to a subsequence of one of the genes (or the 
mRNA or the corresponding antisense cRNA) whose expression is to be detected. In 
addition, the array can contain normalization controls, mismatch controls and expression 
level controls as described herein. 

In a particularly preferred embodiment, the variation between dififerent 
copies (wiUiin and/or between batches) of each array is less than 20%, more preferably 
less than about 10%, and most piefetably less than about 5% where the variation is 
measured as tiie coefficient of variation in hybridization intensity averaged over at least 5 
oligonucleotide probes for each gene whose expression the array is to detect. 

The pool of nucleic acids may be labded before, during, or after 
hybridization, altiiough in a preferred embodiment, the nucleic acids are labeled before 
hybridization. Fluorescence labels are particularly preferred, more preferably labeling 
with a single fluorbphorc, and, where fluorescence labeling is used, quantification of 
tiie hybridized nucleic addis is by quantification of fluorescence from tiie hybridized 
fluorescentiy labeled nucleic acid. Such quantification is facilitated by the use of a 
fluorescence microscope which can be equipped with an automated stage to permit 
automatic scanning of die array, and which can be equipped with a data acquisition 
system for tiie automated measurement recording and subsequent processing of tiie 
fluorescence intensity information. 

In a preferred embodiment, hybridization is at low stringency (e.g. about 
20'C to about 50'C, more preferably about 30'C to about 40»C, and most preferably 
about 37»C and 6X SSPE-T or lower) witii at least one wash at higher stringency. 
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Hybridization may include subsequent washes at progressively increasing stringency until 
a desired level of hybridization specificity is reached. 

Quantification of the hybridization signal can be by any means known to 
one of skill in the art. However, in a particularly preferred embodiment, quantification 
5 is achieved by use of a confocal fluorescence microscope. Data is preferably evaluated 
by calculating the difference in hybridization signal intensity between each 
oligonucleotide probe and its corre^nding mismatch control probe. It is particularly 
preferred that this difference be calculated and evaluated for each gene. Particularly 
preferred analytical methods are provided herein. 

10 The pool of target nucleic acids can be the total poly A ^ mRNA isolated 

firom a biological sample, or cDNA made by reverse transcription of the RNA or second 
strand cDNA or RNA transcribed from the double stranded cDNA intermediate. 
Alternatively, the pool of target nucleic acids can be treated to reduce the complexity of 
the sample and thereby reduce the background signal obtained in hybridization. In one 

IS approach, a pool of mRNAs, derived from a biological sample, is hybridized with a pool 
of oligonucleotides comprising the oligonucleotide probes present in the high density 
array. The pool of hybridized nucleic acids is then treated with RNase A which digests 
the single stranded regions. The remaining double stranded hybridization complexes are 
then denatured and the oligonucleotide probes are removed, leaving a pool of mRNAs 
20 enhanced for those mRNAs complementary to the oligonucleotide probes in the high 
density array. 

In another approach to background reduction, a pool of mRNAs derived 
from a biological sample is hybridized widi paired target specific oligonucleotides where 
the paired target specific oligonucleotides are complementary to regions flanking 

25 subsequences of the mRNAs complementary to the oligonucleotide probes in the high 
density array. The pool of hybridized nucleic acids is treated with RNase H which 
digests the hybridized (double stranded) nucleic acid sequences. The remaining single 
stranded nucleic acid sequences which have a length about equivalent to the region 
flanked by the paired target specific oligonucleotides are then isolated {e.g. by 

30 electrophoresis) and used as the pool of nucleic acids for monitoring gene expression. 
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Finally, a third approach to background reducUon involves eliminating or 
reducing the representation in d,e pool of particular preselected target mRNA messages 
ie.g.. messages that are characteristically overexpressed in the sample). This method 
involves hybridizing an oligonucleotide pix,be that is complementary to the preselected 
target mRNA message to the pool of polyA* mRNAs derived from a biological sample 
The oUgonucleotide probe hybridizes with the particular preselected polyA* mRNA 
(message) to which it is complementary. The pool of hybridized nucleic acids is treated 
with RNase H which digests the double stranded (hybridized) region thereby separating 
the message from its polyA* tail. Isolating or amplifying (e.g.. using an oligo dT 
column) the polyA- mRNA in the pool then provides a pool having a reduced or no 
representation of the preselected target mRNA message. 

It will be appreciated that the methods of this invention can be used to 
monitor (detect and/or quantify) the expression of any desired gene of known sequence 
or subsequence. Moreover, diese methods permit monitoring expression of a large 
number of genes simultaneously and effect significant advantages in reduced labor, cost 
and time. The simultaneous monitoring of the expression levels of a multiplicity of 
genes permits effective comparison of relative expression levels and identification of 
biological conditions characterized by alterations of relative expression levels of various 
genes. Genesof particular interest for expression monitoring include genes involved in 
the pathways associated with various pathological conditions (e.g. . cancer) and whose 
expression is thus indicative of the pathological condition. Such genes include, but are 
not limited to the HER2 (c-ert)B-2/neu) proto-oncogene in the case of breast clncer. 
receptor tyrosine kinases (RIXs) associated with the etiology of a number of tumors 
including carcinomas of the breast. Uver, bladder, pancreas, as well as glioblastomas, 
sarcomas and squamous carcinomas, and tumor suppressor genes such as the P53 gene 
and other "maricer" genes such as RAS, MSH2, MLHl and BRCAl. Other genes of 
particular interest for expression monitoring are genes involved in the immune response 
(e.g.. interleukin genes), as well as genes involved in cell adhesion (e.g.. the integrins 
or sdectins) and signal transduction (e.g. , tyrosine kinases), etc. 

In another embodiment, this invention provides a method of identifying 
genes that are effected by one or more drugs, or conversely, screening a number of 
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drugs to identify those that have an effect on particular gene(s). This involves providing 
a pool of target nucleic acids from one or more cells contacted with the drug or drugs 
and hybridizing that pool to any of the high density oligonucleotide arrays described 
herein. The expression levels of the genes targeted by the probes in the array are 
5 determined and compared to expression levels of genes from ""contror cells not exposed 
to the drug or drugs. The genes that are overexpressed or underexpressed in response to 
the drug or drugs are identified or conversely the drug or drugs that alter expression of 
one or more genes are identified. 

In still yet another embodiment, this invention provide for a composition 

10 comprising any of the high density oligonucleotide arrays disclosed herein where the 
oligonucleotide probes are specifically hybridized to one or more fluorescently labeled 
nucleic acids (which are the transcription products of genes or derived from those 
transcription products) thereby forming a fluorescent array in which the fluorescence of 
the array is indicative of the transcription levels of the multiplicity of genes. One of 

15 skill will appreciate that such a hybridized array may be used as a reference, control, or 
standard {e.g., provided in a kit) or may itself be a diagnostic array indicating the 
expression levels of a multiplicity of genes in a sample. 

This invention also provides kits for simultaneously monitoring expression 
levels of a multiplicity of genes. The kits include an array of immobilized 

20 oligonucleotide probes complementary to subsequences of the multiplicity of target 

genes, as described herein. The kit may also include instructions describing the use of 
the array for detection and/or quantification of expression levels of the multiplicity of 
genes. The kit may additionally include one or more of the following: buffers, 
hybridization mix, wash and read solutions, labels, labeling reagents (enzymes etc.), 

25 "control" nucleic acids, software for probe selection, array reading or data analysis and 
any of the other materials or reagents described herein for the practice of the claimed 
methods. 

In another embodiment, this invention provides for a method of selecting 
a set of oligonucleotide probes, that specifically bind to a target nucleic acid (e.g. , a 
30 gene or genes whose expression is to be monitored or nucleic acids derived from the 

gene or its transcribed mRNA). The method involves providing a high density array of 
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Oligonucleotide probes where the array comprises a multiplicity of probes wherein each 
probe is complementary to a subsequence of the target nucleic acid. The target nucleic 
acid is then hybridized to the array of oligonucleotide probes to identify and select those 
probes where the difference in hybridization signal intensity between each probe and its 
mismatch control is detectable (preferably greater than about 10% of the background 
signal intensity, more preferably greater than about 20% of the background signal 
intensity and most preferably greater than about 50% of the background signal intensity). 
The method can further comprise hybridizing the array to a second pool of nucleic acids 
comprising nucleic acids other than the target nucleic acids; and identifying and selecting 
probes having the lowest hybridization signal and where both the probe and its mismatch 
control have a hybridization intensity equal to or less than about 5 times the background 
signal intensity, preferably equal to or less than about 2 times tije background signal 
intensity, more preferably equal to or less tiian about 1 times tiie background signal 
intensity, and most preferably equal or less than about half the background signal 
intensity. 

In a preferred embodiment, the multiplicity of probes can include eveiy 
different probe of lengdi n that is complementary to a subsequence of tiie target nucleic 

add. The probes caniange from about 10 to about 50 nucleotides in length. The array 
is preferably a high density array as described above. Similarly, the hybridization 
methods, conditions, times, fluid volumes, detection methods are as herein . 

In another embodiment, tiie invention provides a computer-implemented 
method of monitoring expression of genes comprising the steps of: receiving input of 
hybridization intensities for a plurality of nucleic acid probes including pairs of perfect 
match probes and mismatch probes, the hybridization intensities indicating hybridization 
affinity between the plurality of nucleic acid probes and nucleic acids corresponding to a 
gene, and each pair including a perfect match probe that is perfecUy complemenury to a 
portion of the nucleic acids and a mismatch probe that differs from the perfect match 
probe by at least one nucleotide; comparing the hybridization intensities of the perfect 
match and mismatch probes of each pair; and indicating expression of the gene 
according to results of the comparing step. Preferably, die differences between the 
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hybridization intensities of the perfect match and mismatch probes of each pair are 
calculated. 

Additionally, the invention provides a computer-implemented method for 
monitoring expression of genes comprising the steps of: receiving input of a nucleic 
5 acid sequence constituting a gene; generating a set of probes that are perfectly 

complementary to the gene; and identifying a subset of probes, including less than all of 
the probes in the set, for monitoring the expression of the gene. Each probe of the set 
may be analyzed by criteria that specify characteristics indicative of low hybridization or 
high cross hybridization. The criteria may include if occurrences of a specific nucleotide 
10 in a probe crosses a threshold value, if the number of a specific nucleotide that repeats 
sequentially in a probe crosses a threshold value, if the length of a palindrome in a probe 
crosses a threshold value, and the like. 



15 Definitiorei. 

The phrase "massively parallel screening" refers to the simultaneous 
screening of at least about 100, preferably about 1000, more preferably about 10,000 
and most preferably about 1,000,000 different nucleic acid hybridizations. 

The terms "nucleic acid" or "nucleic acid molecule" refer to a 

20 deoxyribonucleotide or ribonucleotide polymer in either single-or double-stranded form, 
and unless otherwise limited, would encompass known analogs of natural nucleotides 
that can function in a similar manner as naturally occurring nucleotides. 

An oligonucleotide is a single-stranded nucleic acid ranging in length 
from 2 to about 500 bases. 

25 As used herein a "probe" is defined as an oligonucleotide capable of 

binding to a target nucleic acid of complementary sequence through one or more types of 
chemical bonds, usually through complementary base pairing, usually through hydrogen 
bond formation. As used herein, an oligonucleotide probe may include natural (i.e. A, 
G, C, or T) or modified bases (7-deazaguanosine, inosine, etc.). In addition, the bases 

30 in oligonucleotide probe may be joined by a linkage other than a phosphodiester bond, 
so long as it does not interfere with hybridization. Thus, oligonucleotide probes may be 
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peptide nucleic acids in which the constituent ba^ are joined by peptide bonds rather 
than phosphodiester linkages. 

The term "target nucleic acid" refers to a nucleic acid (often derived from 
a biological sample), to which the oligonucleotide probe is designed to specifically 
hybridize. It is either the presence or absence of the target nucleic acid that is to be 
detected, or the amount of the target nucleic acid that is to be quantified. TTie target 
nucleic acid has a sequence that is complementary to the nucleic acid sequence of the 
corresponding probe directed to the target. The term target nucleic acid may refer to the 
specific subsequence of a larger nucleic acid to which the probe is directed or to the 
overall sequence (e.g.. gene or mRNA) whose expression level it is desired to detect. 
The difference in usage wiU be apparent from context. 

"Subsequence" refers to a sequence of nucleic acids that comprise a part 
of a longer sequence of nucleic adds. 

The term "complexity-is used here according to standard meaning of this 
terni as established by Britten « c/. Methods o/Emymol. 29:363 (1974). See, also 
Cantor and Sdiimmel Biophysical Chemistry: Part III at 1228-1230 for fimhe^ 
explanation of nucleic add complexity. 

"Bind(s) substantially- refers to complementary hybridization between a 
probe nucldc add and a target nucldc add and embraces minor mismatches that can be 
accommodated by redudng the stringency of the hybridization media to achieve the 
desired detection of the target polynudeotide sequence. 

The phrase "hybridizing specifically to", refers to the binding, duplexing, 
or hybridizing of a molecule only to a particular nucleotide sequence under stringent 
conditions when that sequence is present in a complex mixture (e.g. . total cdlular) DNA 
or RNA. The term "stringent conditions" refers to conditions under which a probe will 
hybridize to its target subsequence, but to no other sequences. Stringent conditions are 
sequence-dependent and wUl be different in different drcumstances. Longer sequences 
hybridize specifically at higher temperatiires. Generally, stringem conditions are 
selected to be about 5'C lower than the thermal mdting point am) for die spedfic 
sequence at a defined ionic strength and pH. The Tm is the temperature (under defined 
ionic strength, pH, and nucleic add concentration) at which 50% of the probes 
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complementary to the target sequence hybridize to the target sequence at equilibrium. 
(As the target sequences are generally present in excess, at Tm, 50% of the probes are 
occupied at equilibrium). Typically, stringent conditions will be those in which the salt 
concentration is at least about 0.01 to 1.0 M Na ion concentration (or other salts) at pH 
S 7.0 to 8.3 and the temperature is at least about 30'*C for short probes (e.g. , 10 to 50 
nucleotides). Stringent conditions may also be achieved with the addition of 
destabilizing agents such as formamide. 

The term ''perfect match probe" refers to a probe that has a sequence that 
is perfectly complementary to a particular target sequence. The test probe is typically 

10 perfectly complementary to a portion (subsequence) of the target sequence. The perfect 
match (PM) probe can be a '"test probe", a normalization control" probe, an expression 
level control probe and the like. A perfect match control or perfect match probe is, 
however, distinguished from a mismatch control** or mismatch probe." 

The term "mismatch control" or "mismatch probe" refer to probes whose 

IS sequence is deliberately selected not to be perfectly complementary to a particular target 
sequence. For each mismatch (MM) control in a high-density array there typically exists 
a corresponding perfect match (PM) probe that is perfectly complementary to the same 
particular target sequence. The mismatch may comprise one or more bases. While the 
mismatch(s) may be locates anywhere in the mismatch probe, terminal mismatches are 

20 less desirable as a terminal mismatch is less likely to prevent hybridization of the target 
sequence. In a particularly preferred embodiment, the mismatch is located at or near the 
center of the probe such that the mismatch is most likely to destabilize the duplex with 
the target sequence under the test hybridization conditions. 

The terms "background" or "background signal intensity" refer to 

25 hybridization signals resulting from non-specific binding, or other interactions, between 
the labeled target nucleic acids and components of the oligonucleotide array (e.g. , the 
oligonucleotide probes, control probes, the array substrate, etc.). Background signals 
may also be produced by intrinsic fluorescence of the array components themselves. A 
single background signal can be calculated for the entire array, or a different background 

30 signal may be calculated for each target nucleic acid. In a preferred embodiment, 

background is calculated as the average hybridization signal intensity for the lowest 5% 
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to 10% Of the probes in the array, or, where a different background signal is calculated 
for each target gene, for the lowest 5% to 10% of the probes for each gene. Of course, 
one of skill in the art will appreciate that where the probes to a particular gene hybridizl 
well and thus appear to be specifically binding to a target sequence, they should not be 
used in a background signal calculation. Alternatively, background may be calculated as 
the average hybridization signal intensity produced by hybridization to probes that are 
not complementary to any sequence found in the sample (e.g. probes directed to nucleic 
acids of the opposite sense or to genes not found in the sample such as bacterial genes 
where the sample is mammalian nucleic acids). Background can also be calculated as 
the average signal intensity produced by r^ions of the array that lack any probes at all. 

The term "quantifying" when used in the context of quantifying 
transcription levels of a gene can refer to absolute or to relative quantification. Absolute 
quantification may be accomplished by inclusion of known concentration(s) of one or 
more target nucleic acids (e.g, control nucleic acids such as Bio B or with known 
amounts the target nucleic acids themselves) and referencing die hybridization intensity 
of unknowns with the known target nucleic acids (e.g. through generation of a standard 
curve). Alternatively, relative quantification can be accomplished by comparison of 
hybridization signals between two or more genes, or between two or more treatments to 
quantify die changes in hybridization intensity and. by implication, transcription level. 

The"percentage of sequence identity" or "sequence identity" is 
determined by comparing two optimally aligned sequences or subsequences over a 
comparison window or span, wherein the portion of the polynucleotide sequence in the 
comparison window may optionally comprise additions or deletions (i.e. , gaps) as 
compared to the reference sequence (which does not comprise additions or deletions) for 
optimal alignment of the two sequences. The percentage is calculated by determining 
the number of positions at which the identical subunit (e.g. nucleic acid base or amino 
acid residue) occurs in both sequences to yield the number of matched positions, 
dividing the number of matched positions by the total number of positions in the window 
of comparison and multiplying the result by 100 to yield the percentage of sequence 
identity. Percentage sequence identity when calculated using the programs GAP or 
BESTFIT (see below) is calculated using default gap weights. 
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Methods of alignment of sequences for comparison are well known in the 
art. Optimal alignment of sequences for comparison may be conducted by the local 
homology algorithm of Smith and Waterman, Adv. AppL Math, 2: 482 (1981), by the 
homology alignment algorithm of Needleman and Wunsch J. MoL Biol. 48: 443 (1970), 
S by the search for similarity method of Pearson and Lipman, Proc. Natl Acad. ScL USA 
85: 2444 (1988), by computerized implementations of these algorithms (including, but 
not limited to CLUSTAL in the PC/Gene program by Intelligenetics, Moutain View, 
California, GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software 
Package, Genetics Computer Group (GCG), 575 Science Dr., Madison, Wisconsin, 
10 USA), or by inspection. In particular, methods for aligning sequences using the 

CLUSTAL program are well described by Higgins and Sharp in Gene, 73: 237-244 
(1988) and in CABIOS 5: 151-153 (1989)). 

RT^TFF DFSCRTPT TON OF THE DRAWINGS 

15 Fig. 1 shows a schematic of expression monitoring using oligonucleotide 

arrays. Extracted poly (A)* UNA is converted to cDNA, which is then transcribed in the 
presence of labeled ribonucleotide triphosphates. L is either biotin or a dye such as 
fluorescein. RNA is fragmented with heat in the presence of magnesium ions. 
Hybridizations are carried out in a flow cell that contains the two-dimensional DNA probe 

20 arrays. Following a brief washing step to remove unhybridized RNA, the arrays are 

scanned using a scanning confocal microscope. Alternatives in which cellular mRNA is 
directly labeled without a cDNA intermediate are described in the Examples. Image 
analysis software converts the scanned array images into text files in which the observed 
intensities at specific physical locations are associated with particular probe sequences. 

25 Fig. 2A shows a fluorescent image of a high density array containing over 

16,000 different oligonucleotide probes. The image was obtained following hybridization 
(15 hours at 40 ''C) of biotin-labeled randomly fragmented sense RNA transcribed from the 
murine B cell (TIO) cDNA library, and spiked at the level of 1 :3,000 (50 pM equivalent to 
about 100 copies per cell) with 13 specific RNA targets. The brightness at any location is 

30 indicative of the amount of labeled RNA hybridized to the particular oligonucleotide probe. 
Fig. 2B shows a small portion of the array (the boxed region of Fig. 2A) containing probes 
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for IL-2 and IL.3 RNAS. For comparison. Fig. 2C shows shown the same region of the 
array foUowing hybridization with an unspiked TIO RNA samples (TIO cells do not express 
IL-2 and E..3). The variation in the signal intensity was highly reproducible and reflected 
the sequence dependence of the hybridization efficiencies. The central cross and the four 
comers of the array contain a control sequence that is complementary to a biotin-labeled 
oligonucleotide that was added to the hybridization solution at a constant concentration (50 
pM). The sharpness of the images near the boundaries of the features was limited by the 
resolution of the reading device (1 1 .25 pm) and not by the spatial resolution of the amiy 
synthesis. The pixels in the border regions of each synthesis feature were systematically 
ignored in the quantitative analysis of the images. 

Fig. 3 provides a log/log plot of the hybridization intensity (average of the 
PM-MM intensity differences for each gene) versus concentration for 1 1 different RNA 
targets. The hybridization signals were quantitatively related to target concentration. The 
experiments were performed as described in the Examples herein and in Fig. 2. The ten 10 
cytokine RNAs (plus bioB) were spiked imo labeled TIO RNA at levels ranging from 
1:300.000 to 1:3,000. The signals continued to increase with increased concentration up to 
frequencies of 1 :300. but the response became subiinear at the high levels due to saturation 
of the probe sites. The linear range can be extended to higher concentrations by using 
shorter hybridization times. RNAs from genes expressed in TIO cells (IL-10. P-actin and 
GAPDH) were also detected at levels consistent with results obtained by probing cDNA 
libraries. 

Fig. 4 shows cytokine mRNA levels in the murine 2D6 T helper cell line at 
different times following stimulation with PMA and a calcium ionophore. Poly (A)' RNA 
was extracted at 0. 2. 6. and 24 hours following stimulation and converted to double 
stranded cDNA containing an RNA polymerase promoter. The cDNA pool was tiien 
transcribed in the presence of biotin labeled ribonucleotide triphosphates, fragmemed, and 
hybridized to the oligonucleotide probe arrays for 2 and 22 hours. The fluorescence 
intensities were compelled to RNA frequencies by comparison with die signals obtained for 
a bacterial RNA (biotin synthetase) spiked into tiie samples at known amounts prior to 
hybridization. A signal of 50,000 corresponds to a frequency of approximately 1 : 100.000 
to a frequency of 1 :5,000, and a signal of 1 00 to a frequency of 1 rSO.OOO. RNAs for IL-2. 
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IL-4, IL-6, and IL-12p40 were not detected above the level of approximately 1:200,000 in 
these experiments. The error bars reflect the estimated uncertainty (25 percent) in the level 
for a given RNA relative to the level for the same RNA at a different time point. The 
relative uncertainty estimate was based on the results of repeated spiking experiments, and 
5 on repeated measurements of IL-10, P^acttn and GAPDH RNAs in preparations from both 
TIO and 2D6 cells (unstimulated). The uncertainty in the absolute frequencies includes 
message-to-message differences in the hybridization efficiency as well as differences in the 
mRNA isolation, cDNA synthesis, and RNA synthesis and labeling steps. The uncertainty 
in the absolute frequencies is estimated to be a factor of three 

10 Fig. S shows a fluorescence image of an array containing over 63,000 

different oligonucleotide probes for 1 18 genes. The image was obtained following 
overnight hybridization of a labeled murine B cell RNA sample. Each square synthesis 
region is SO x SO ^m and contains 107 to 108 copies of a specific oligonucleotide. The 
array was scanned at a resolution of 7.5 fim in approximately IS minutes. The bright rows 

IS indicate RNAs present at high levels. Lower level RNAs were unambiguously detected 
based on quantitative evaluation of the hybridization patterns. A total of 21 murine RNAs 
were detected at levels ranging from approximately 1 :300,000 to 1 : 1 00. The cross in the 
center, the checkerboard in the comers, and the MUR-I region at the top contain probes 
complementary to a labeled control oligonucleotide that was added to all samples. 

20 Fig. 6 shows an example of a computer system used to execute the software 

of an embodiment of the present invention. 

Fig. 7 shows a system block diagram of a typical computer system used to 
execute the software of an embodiment of the present invention. 

Fig. 8 shows the high level flow of a process of monitoring the expression of 

25 a gene by comparing hybridization intensities of pairs of perfect match and mismatch 
probes. 

Fig. 9 shows the flow of a process of determining if a gene is expressed 
utilizing a decision matrix. 

Figs. lOA and lOB show the flow of a process of determining the expression 
30 of a gene by comparing baseline scan data and experimental scan data. 
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Fig. 1 1 shows the flow of a process of increasing the number of probes for 
monitoring the expression of genes after the number of probes has been reduced or pruned. 

DETAn Fn n ^srRTPnniy 
I . Hiph nensitv Arrays For Monitnrinp r^« p Fvprp^^l nn 

This invention provides methods of monitoring (detecting and/or 
quantifying) the expression levels of one or more genes. The methods involve 
hybridization of a nucleic acid target sample to a high density array of nucleic add 
probes and then quantifying the amount of target nucleic acids hybridized to each probe 
in the array. 

While nucleic acid hybridization has been used for some time to 
determine the expression levels of various genes (e.^. . Northern Blot), it was a 
surprising discovery of this invention that high density arrays are suitable for the 
quantification of the small variations in expression (tnmscription) levels of a gene in the 
presence of a large population of heterogenous nucleic acids. The signal may be present 
at a concentration of less than about 1 in 1.000, and is often present at a concentration 
less than 1 in 10.000 more preferably less than about 1 in 50.000 and most preferably 
less than about 1 in 100.000, 1 in 300.000. or even 1 in 1.000.000. 

Prior to thU invention, it was expected that hybridization of such a 
complex mixture to a high density array might overwhelm the available probes and make 
it impossible to detect the presence of low-level target nucleic acids. It was thus unclear 
that a low level signal could be isolated and detected in the presence of misleading 
signals due to cross-hybridization and non-specific binding both to substrate and probe. 
It was therefore a surprising discovery that, to the contrary, high density arrays are 
particularly well suited for monitoring expression of a multiplicity of genes and provide 
a level of sensitivity and discrimination hitherto unexpected. 

It was also a surprising discovery of this invention that when used in a 
high-density array, even relatively short oligonucleotides can be used to accurately detect 
and quantify expression (transcription) levels of genes. Thus oligonucleotide arrays 
having oligonucleotides as short as 10 nucleotides, more preferably 15 oligonucleotides 
and most preferably 20 or 25 oligonucleotides are used to specifically detect and quantify 
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gene expression levels. Of course arrays containing longer oligonucleotides, as 
described herein, are also suitable. 

A) Advantages of Qligonucleotidc Arrays 

S In one preferred embodiment, the high density arrays used in the methods 

of this invention comprise chemically synthesized oligonucleotides. The use of 
chemically synthesized oligonucleotide arrays, as opposed to, for example, blotted arrays 
of genomic clones, restriction fragments, oligonucleotides, and the like, offers numerous 
advantages. These advantages generally fall into four categories: 

10 1) Efficiency of production; 

2) Reduced intra- and inter-array variability; 

3) Increased information content; and 

4) Higher signal to noise ratio (improved sensitivity). 



IS 1> Efficicncv of production. 

In a preferred embodiment, the arrays are synthesized using methods of 
spatially addressed parallel synthesis (see, e.g.. Section V, below). The oligonucleotides 
are synthesized chemically in a highly parallel fashion covalently attached to the array 
surface. This allows extremely efficient array production. For example, arrays 

20 containing tens (or even hundreds) of thousands of specifically selected 20 mer 

oligonucleotides are synthesized in fewer than 80 synthesis cycles. The arrays are 
designed and synthesized based on sequence information alone. Thus, unlike blotting 
methods, the array preparation requires no handling of biological materials. There is no 
need for cloning steps, nucleic acid amplifications, cataloging of clones or amplification 

25 products, and the like. The preferred chemical synthesis of expression monitoring arrays 
in this invention is thus more efficient blotting methods and permits the production of 
highly reproducible high-density arrays with relatively little labor and expense. 



30 



2^ Rgduced intra- and Inter-arrav variabiMtv, 
The use of chemically synthesized high-density oligonucleotide arrays in 
the methods of this invention improves intra- and inter-array variability. The 
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Oligonucleotide arrays preferred for this invention are made in large batches (presenUy 
49 arrays per wafer with multiple wafers synthesized in parallel) in a highly controlled 
reproducible manner. This makes them suitable as general diagnostic and research tools 
permitting direct comparisons of assays performed anywhere in the world. 

Because of the precise control obtainable during the chemical synthesis the 
arrays of this invention show less than about 25%. preferably less than about 20%, more 
preferably less than about 15%. still more preferably less than about 10%. even more 
preferably less than about 5%. and most preferably less than about 2% variation between 
high density arrays (within or between production batches) having the same probe 
composition. Array variation is assayed as the variation in hybridization intensity 
(against a labeled control target nucleic acid mixture) in one or more oligonucleotide 
probes between two or more arrays. More preferably, array variation U assayed as the 
variation in hybridization intensity (against a labeled control target nucleic acid mixture) 
measured for one or more target genes between two or more arrays. 

In addition to reducing inter- and intra-array variability, chemically 
synthesized arrays also reduce variations in relative probe frequency inherent in spotting 
methods, particulariy spotting methods that use cell-derived nucleic acids (e.g., cDNAs). 
Many genes are expressed at the level of tiiousands of copies per cell, while others are 
expressed at only a single copy per cell. A cDNA library will reflect this very large bias 
as will a cDNA library made finom theis material. While normalization (adjustment of 
the amount of each different probe e.g.. by comparison to a reference cDNA) of the 
Ubraiy will reduce the representation of over-expressed sequences, normalization has 
been shown to lessen the odds of selecting highly expressed cDNAs by only about a 
fector of 2 or 3. In contrast, chemical synthesis methods can insure that all 
oligonucleotide probes are represented in approximately equal concentrations. This 
decreases the inter-gene (intra-array) variability and permits direct comparison between 
characteristically overexpressed and underexpressed nucleic acids. 



3) Increa.sed informatift n cnntgnt. 

As indicated above, it was a discovery of this invention that the use of 
high density oligonucleotide arrays for expression monitoring provides a number of 
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advantages not found with other methods. For example, the use of large numbers of 
different probes that specifically bind to the transcription product of a particular target 
gene provides a high degree of redundancy and internal control that permits optimization 
of probe sets for effective detection of particular target genes and minimizes the 
5 possibility of errors due to cross-reactivity with other nucleic acid species. 

Apparently suitable probes often prove ineffective for expression 
monitoring by hybridization. For example, certain subsequences of a particular target 
gene may be found in other regions of the genome and probes directed to these 
subsequences will cross-hybridize with the other regions and not provide a signal that is 

10 a meaningful measure of the expression level of the target gene. Even probes that show 
little cross reactivity may be unsuitable because they generally show poor hybridization 
due to the formation of structures that prevent effective hybridization. Finally, in sets 
with large numbers of probes, it is difficult to identify hybridization conditions that are 
optimal for all die probes in a set. Because of the high degree of redundancy provided 

15 by the large number of probes for each target gene, it is possible to eliminate tiiose 

probes that function poorly under a given set of hybridization conditions and still retain 
enough probes to a particular target gene to provide an extremely sensitive and reliable 
measure of the expression level (transcription level) of that gene. 

In addition, the use of large numbers of different probes to each target 

20 gene makes it possible to monitor expression of families of closely-related nucleic acids. 
The probes may be selected to hybridize both with subsequences that are conserved 
across the family and with subsequences that differ in the different nucleic acids in the 
family. Thus, hybridization with such arrays permits simultaneous monitoring of the 
various members of a gene £amily even where the various genes are approximately the 

25 same size and have high levels of homology. Such measurements are difficult or 
impossible with traditional hybridization methods. 

Because the high density arrays contain such a large number of probes it 
is possible to provide numerous controls including, for example, controls for variations 
or mutations in a particular gene, controls for overall hybridization conditions, controls 

30 for sample preparation conditions, controls for metabolic activity of the cell from which 
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the nucleic adds are derived and mismatch controls for non-specific binding or cross 
hybridization. 

Moreover, as explained above, it was a surprising discovery of this 
invention that effective detection and quantitation of gene transcription in complex 
mammaUan cell message populations can be determined with relatively short 
oligonucleotides and with relative few (e.g. , fewer than 40. preferably fewer than 30. 
more preferably fewer than 25, and most preferably fewer than 20, 15, or even 10) 
oligonucleotide probes per gene. In general, it was a discovery of this invention that 
there are a large number of probes which hybridize both strongly and specifically for 
each gene. This does not mean that a large number of probes is required for detection, 
but rather that there are many from which to choose and that choices can be based on 
other considerations such as sequence uniqueness (gene femilies). checking for splice 
variants, or genotyping hot spots (things not easily done with cDNA spotting methods). 

Based on these discoveries, sets of four arrays are made that contain 
approximately 400,000 probes each. Sets of about 40 probes (20 probe pairs) are chosen 
that are complementary to each of about 40,000 genes for which there are ESTs in the 
pubMc database. This set of ESTs covers roughly one-third to one-half of all human 
genes and these arrays will allow the levels of all of them to be monitored in a parallel 
set of overnight hybridizations. 

4) Imnmvpii rfynal tn m^jft, 

Blotted nucleic acids typically rely on ionic, electrostatic, and 
hydrophobic interactions to attach the blotted nucleic acids to the substrate. Bonds are 
formed at multiple points along the nucldc acid restricting degrees of freedom and 
interferign with the ability of the nucleic acid to hybridize to its complementary target. 
In contrast, the preferred arrays of this invention are chemically synthesized. The 
oligonucleotide probes are attached to the substrate by a single terminal covalent bond. 
The probes have more degrees of freedom and are capable of participating in complex 
interactions with their complementary targets. Consequently, the probe arrays of this 
invention show significanOy higher hybridization efficiencies (10 times, 100 times, and 
even 1000 times more effecient) than blotted arrays. Less target oligonucleotide is used 
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' to produce a given signal thereby dramatically improving the signal to noise ratio. 
Consequently the methods of this invention permit detection of only a few copies of a 
nucleic acid in extremely complex nucleic acid mixtures. 

5 Bl Preferred High Pcmity Arrays 

Preferred high density arrays of this invention comprise greater than about 
100, preferably greater than about 1000, more preferably greater than about 16,000 and 
most preferably greater than about 65,000 or 250,000 or even greater than about 
1 ,000,000 different oligonucleotide probes. The oligonucleotide probes range from 

10 about 5 to about 50 or about S to about 45 nucleotides, more preferably from about 10 to 
about 40 nucleotides and most preferably from about 15 to about 40 nucleotides in 
length. In particular preferred embodiments, the oligonucleotide probes are 20 or 25 
nucleotides in length. It was a discovery of this invention that relatively short 
oligonucleotide probes sufficient to specifically hybridize to and distinguish target 

15 sequences. Thus in one preferred embodiment, the oligonucleotide probes are less than 
50 nucleotides in length, generally less than 46 nucleotides, more generally less than 41 
nucleotides, most generally less than 36 nucleotides, preferably less than 31 nucleotides, 
more preferably less than 26 nucleotides,and most preferably less than 21 nucleotides in 
length. The probes can also be less than 16 nucleotides or less than even 1 1 nucleotides 

20 in length. 

The location and sequence of each different oligonucleotide probe 
sequence in the array is known. Moreover, the large number of different probes 
occupies a relatively small area providing a high density array having a probe density of 
generally greater than about 60, more generally greater than about 100, most generally 

25 greater than about 600, often greater than about 1000, more often greater than about 

5,000, most often greater tiian about 10,000, preferably greater than about 40,000 more 
preferably greater than about 100,000, and most preferably greater than about 400,000 
different oligonucleotide probes per cm^ The small surface area of the array (often less 
than about 10 cm^, preferably less than about 5 cm^ more preferably less than about 2 

30 cm^, and most preferably less than about 1.6 cm^) permits extremely uniform 
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hybridization conditions (temperature regulation, salt content, ere.) while the extremely 
large number of probes allows massively parallel processing of hybridizations. 

Finally, because of the small area occupied by the high density arrays, 
hybridization may be carried out in extremely small fluid volumes (e.g. , 250 ^1 or Iws, 
more preferably 100 ^1 or less, and most preferably 10 m1 or less). In small volumes. ' 
hybridization may proceed very rapidly. In addition, hybridization conditions are 
extremely uniform throughout the sample, and the hybridization format is amenable to 
automated processing. 

H. IJsfs of K«prp«8n^ mffnif itHnf . 

This invention demonstrates that hybridization with high density 
oligonucleotide probe arrays provides an effective means of monitoring expression of a 
multiplicity of genes. In addition this invention provides for methods of sample 
treatment and array designs and methods of probe selection that optimize signal detection 
at extremely low concentrations in complex nucleic acid mixtures. 

The expression monitoring methods of this invention may be used in a 
wide variety of circumstances including detection of disease, identification of differential 
gene expression between two samples (e.g., a patiiological as compared to a healthy 
sample), screening for compositions Uiat upregulate or downregulate tht expression of 
particular genes, and so forth. 

In one preferred embodiment, tiie methods of this invention are used to 
monitor the expression (transcription) levels of nucleic acids whose expression is altered 
in a disease state. For example, a cancer may be characterized by the overexpression of 
a particular marker such as the HER2 (c-erbB-2/neu) proto^ncogene in the case of 
breast cancer. Similarly, overexpression of receptor tyrosine kinases (RTKs) is 
associated with tiie etiology of a number of tumors including carcinomas of Uie breast, 
Uver. bladder, pancreas, as well as glioblastomas, sarcomas and squamous carcinomas 
(see Carpenter, Ann. Rev. Biochem.. 56: 881-914 (1987)). Conversely, a cancer (e.g., 
colerectal, lung and breast) may be characterized by tiie mutation of or underexpression 
of a tiimor suppressor gene such as P53 (see. e.g. , Tominaga et al. Critical Rev. in 
Oncogenesis, 3: 257-282 (1992)). 
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In another preferred embodiment, the methods of this invention are used 
to monitor expression of various genes in response to defmed stimuli, such as a drug. 
The methods are particularly advantageous because they permit simultaneous monitoring 
of the expression of thousands of genes. This is especially useful in drug research if the 
S end point description is a complex one, not simply asking if one particular gene is 

overexpressed or underexpressed. Thus, where a disease state or the mode of action of a 
drug is not well characterized, the methods of this invention allow nqpid determination of 
the particularly relevant genes. 

As indicated above, the materials and methods of this invention are 

10 typically used to monitor the expression of a multiplicity of different genes 

simultaneously. Thus, in one embodiment, the invention provide for simultaneous 
monitoring of at least about 10, preferably at least about 100, more preferably at least 
about 1000, still more preferably at least about 10,000, and most preferably at least 
about 100,000 different genes. 

IS The expression monitoring methods of this invention can also be used for 

gene discovery. Many genes that have been discovered to date have been classified into 
families based on commonality of the sequences. Because of the extremely large number 
of probes it is possible to place in the high density array, it is possible to include 
oligonucleotide probes representing known or parts of known members from every gene 

20 class. In utilizing such a "chip" (high density array) genes that are already known 

would give a positive signal at loci containing both variable and common regions. For 
unknown genes, only the common regions of the gene family would give a positive 
signal. The result would indicate the possibility of a newly discovered gene. 

The expression monitoring methods of this invention also allow the 

25 development of ""dynamic** gene databases. The Human Genome Project and 
commercial sequencing projects have generated large static databases which list 
thousands of sequences without regard to function or genetic interaction. Expression 
analysis using the methods of this invention produces ""dynamic" databases that define a 
gene's function and its interactions with other genes. Without the ability to monitor the 

30 expression of large numbers of genes simultaneously ,however, the work of creating 
such a database is enormous. The tedious nature of using DNA sequence analysis for 
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determining an expression pattern involves preparing a cDNA library from the RNA 
isolated from the cells of interest and then sequencing the library. As the DNA is 
sequenced, the operator lists the sequences that are obtained and counts them. 
Thousands of sequences would have to be determined and then the frequency of those 
gene sequences would define the expression pattern of genes for the cells being studied. 

By contrast, using an expression monitoring array to obtain the data 
according to the methods of this invention is relatively fiist and easy. The process 
involves stimulating the cells to induce expression, obtaining the RNA from the cells and 
then either labeling the RNA dirocUy or creating a cDNA copy of the RNA. If cDNA is 
to be hybridized to the chip, fluorescent molecules are inco^wrated during the DNA 
polymerization. Ether the labeled RNA or the labeled cDNA is then hybridized to a 
high density array in one overnight experiment. The hybridization provides a 
quantitative assessment of the levels of every single one of the genes with no additional 
sequencing. In addition the methods of this invention are much more sensitive allowing 
afewcopiesofexpressedgenespercell tobedetected. This procedure is demonstrated 
in the examples provided herein. 

Generally the metiiods of monitoring gene expression of this invention 
involve (1) providing a pool of target nucleic acids comprising RNA transcript(s) of one 
or more taiget gene(s). or nucleic acids derived from the RNA transcripts); (2) 
hybridizing the nucleic acid sample to a high density anay of probes (including control 
probes); and (3) detecting the hybridized nucleic acids and calculating a relative 
expression (transcription) level. 

A) Broviftiny n nf^rW aciri <nmplp 

One of skill in the art Will appreciate that in order to measure die 
tnmscription level (and thereby the expression level) of a gene or genes, it is desirable to 
provide a nucleic acid sample comprising mRNA transcript(s) of the gene or genes, or 
nucleic acids derived from the mRNA iranscript(s). As used herein, a nucleic acid 
derived from an mRNA transcript refers to a nucleic acid for whose synthesis the mRNA 
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transcript or a subsequence thereof has ultimately served as a template. Thus, a cONA 
reverse transcribed from an mRNA, an RNA transcribed from that cDNA, a DNA 
amplified from the cDNA, an RNA transcribed from the amplified DNA, etc. , are all 
derived from the mRNA transcript and detection of such derived products is indicative of 
S the presence and/or abundance of the original transcript in a sample. Thus, suitable 
samples include, but are not limited to, mRNA transcripts of the gene or genes, cDNA 
reverse transcribed from the mRNA, cRNA transcribed from the cDNA, DNA amplified 
from the genes, RNA transcribed from amplified DNA, and the like. 

In a particularly prefixed embodiment, where it is desired to quantify the 
10 transcription level (and thereby expression) of a one or more genes in a sample, the 

nucleic acid sample is one in which the concentration of the mRNA transcript(s) of the 
gene or genes, or the concentration of the nucleic acids derived from the mRNA 
transcript(s), is proportional to the transcription level (and therefore expression level) of 
that gene. Similarly, it is preferred that the hybridization signal intensity be proportional 
IS to the amount of hybridized nucleic acid. While it is preferred that the proportionality 
be relatively strict (e.g., a doubling in transcription rate results in a doubling in mRNA 
transcript in the sample nucleic acid pool and a doubling in hybridization signal), one of 
skill will appreciate that the proportionality can be more relaxed and even non-linear. 
Thus, for example, an assay where a S fold difference in concentration of the target 
20 mRNA results in a 3 to 6 fold difference in hybridization intensity is sufficient for most 
purposes. Where more precise quantification is required appropriate controls can be run 
to correct for variations introduced in sample preparation and hybridization as described 
herein. In addition, serial dilutions of ''standard" target mRNAs can be used to prepare 
calibration curves according to methods well known to those of skill in the art. Of 
25 course, where simple detection of the presence or absence of a transcript is desired, no 
elaborate control or calibration is required. 

In the simplest embodiment, such a nucleic add sample is the total mRNA 
isolated from a biological sample. The term "biological sample**, as used herein, refers 
to a sample obtained from an organism or from components (e.g. , cells) of an organism. 
30 The sample may be of any biological tissue or fluid. Frequently the sample will be a 
"clinical sample" which is a sample derived from a patient. Such samples include, but 
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« no. Umi«d „, .p„„„,. „^ ,^ ^ ^^.^ ^^^^ ^ ^ ^ 
ixom ampl«. Urine, periwenl fl„M, and p,™^ 

purposes. * 

TDe nucleic acid (either genomic DNA or mRNA) may be isolated from 
the sample according to any of a number of methods well known to those of skill in the 
axt^ One of Skill will appredate that Where alte^tions in the copy 
to be detected genomic DNA is pref«ably isolate!: Conversely, whe. exp^ssion levels 
of a gene or genes are to be detected, preferably RNA (mRNA) is isolated. 

^'^*«'^ofi«'l«ting«otalmRNAa.«wellknowntothoseofskillinthe 
art. For example, methods of i«.lation and purificaUon of nucleic acids a,, described in 
de^l m Chapter 3 of Latorasory TecHni,ues in BiocHen^tsrry an. Molecular Biology • 
mmi^a,ionWuHNucleicAci,Prol>es,PanL IT^ory an, Nucleic AO, Prepara^on 
P. Tijssen. ed. Hsevier. N. Y. (1993) and Chapter 3 of Laboratory Teckni,ues in ' 

^^^^^<^MolecularBiolosy: mmza^onmOtNucMcAcidProbes Pan I. 
neory and Nucleic Acid Preparation. P. Tijssen. ed. Hsevier. N. Y. (1993)). 

In a preferred embodiment, the total nucleic acid is isolated from a given 
«mple using, for example, an acid guanidinium-phenol-chloroform extinction method 
and polyA* mRNA is isolated by oligo dT column chromatography or by using (dT)n 
'nagmmc be^ds («.. e.,.. Sambrook etal. , Molecular Clonin,: A Laboratory Manual 
^nd ed.,. Vols. 1-3. Cold Spring Harbor Uboratory. (1989). or Currenr Protocols in 
Molecular Biology, F. Ausubel ed. Greene Publishing and Wiley-Interscience 
New York (1987)). 

FrequenUy. it is desirable to amplify the nucleic add sample prior to 
hybridization, one of skill in the art will appreciate that whatever amplification method 
xs used. ,f a quantitative result is desired, care must be taken to use a method that 
tnaintams or controls for the relative frequencies of the amplified nucleic acids. 

Methods of -quantitative- amplification are well known to tiiose of skill in 
the an. For example, quantitative PCR involves simultaneously co-amplifyi„g a known 
quantity of a control sequence using ti,e same primers. This provides an inter,«l 
standard that may be used to calibrate the PCR reaction. The high density array may 
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then include probes specific to the internal standard for quantification of the amplified 
nucleic acid. 

One preferred internal standard is a synthetic AW106 cRNA. The 
AW106 cRNA is combined with RNA isolated from the sample according to standard 
5 techniques known to those of skill in the art. The RNA is then reverse transcribed using 
a reverse transcriptase to provide copy DNA. The cDNA sequences are then amplified 
(e.^. 9 by PCR) using labeled primers. The amplification products are separated, 
^ically by electrophoresis, and the amount of radioactivity (proportional to the amount 
of amplified product) is determined. The amount of mRNA in the sample is then 

10 calculated by comparison with the signal produced by the known AW 106 RNA standard. 
Detailed protocols for quantitative PCR are provided in PCR Protocols, A Guide to 
Methods and Applications, Innis etaL, Academic Press, Inc. N.Y., (1990). 

Other suitable amplification methods include, but are not limited to 
polymerase chain reaction (PCR) (Innis, et al , PCR Protocols. A guide to Methods and 

15 Application, Academic Press, Inc. San Diego, (1990)), ligase chain reaction (LCR) (see 
Wu and Wallace, Genomics, 4: 560 (1989), Landegren, et aL, Science, 241: 1077 
(1988) and Barringer, etaL, Gene, 89: 117 (1990), transcription amplification (Kwoh, 
etal., Proc. Nasi Acad. Sci. USA, 86: 1173 (1989)), and self-sustained sequence 
replication (Guaielli, etaL, Proc. Nat. Acad. ScL USA, 87: 1874 (1990)). 

20 In a particularly preferred embodiment, the sample mRNA is reverse 

transcribed with a reverse transcriptase and a primer consisting of oligo dT and a 
sequence encoding the phage T7 promoter to provide single stranded DNA template. 
The second DNA strand is polymerized using a DNA polymerase. After synthesis of 
double-stranded cDNA, T7 RNA polymerase is added and RNA is transcribed from the 

25 cDNA template. Successive rounds of oranscripdon from each single cDNA template 
results in amplified RNA. Methods of in vitro polymerization are well known to those 
of skill in the art (see, e.g.^ Sambrook, supra.) and this particular method is described in 
detaU by Van Gelder, et aL. Proc. NatL Acad. ScL USA, 87: 1663-1667 (1990) who 
demonstrate that in vitro amplification according to this method preserves the relative 

30 frequencies of the various RNA transcripts. Moreover, Eberwine et aL Proc. NatL 

Acad. ScL USA, 89: 3010-3014 provide a protocol that uses two rounds of amplification 
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Via in vitro transcription to achieve greater than 10* fold amplification of the original 
starting material thereby permitting expression monitoring even where biological 
samples are limited. 

It WiU be appreciated by one of skill in the art that the direct transcription 
method described above provides an antisense (aRNA) pool. Where antisense RNA is 
used as the target nucleic acid, the oligonucleotide probes provided in the anay are 
chosen to be complementary to subsequences of the antisense nucleic adds. Conversely, 
where the target nucleic acid pool is a pool of sense nucleic adds, the oligonucleotide 
probes are selected to be complementary to subsequences of the sense nucleic acids. 
Finally, where tiie nucleic acid pool is double stranded, the probes may be of dther 
sense as the target nucldc acids indude boUi sense and antisense strands. 

the protocols dted above indude methods of generating pools of either 
sense or antisense nucleic adds. Indeed, one approach can be used to generate dther 
sense or antisense nucldc acids as desired. For example, the cDNA can be directionally 
cloned into a vector (e.g., Stratagene's p Bluscript II KS (+) phagemid) such that it is 
flanked by ti>e T3 and T7 promoters. In vitro transcription with the T3 polymerase will 
produce RNA of one sense (the sense depending on the orientation of die insert), while 
in vitro transcription with the T7 polymerase wUl produce RNA having the opposite 
sense. Otiier suitable cloning systems indude phage lambda vectors designed for Cre- 
farPplasmid subcloning (see e.g., Palazzolo et al., Gene, 88: 25-36 (1990)). 

In a particularly preferred embodiment, a high activity RNA polymerase 
(e.g. about 2500 units/^L for T7. avaUable from Epicentre Technologies) is used. 

In a prefieired embodiment, die hybridized nucleic acids are detected by 
detecting one or more labels attadied to the sample nucleic acids. The labds may be 
incorporated by any of a number of means wdl known to those of skill in the art. 
However, in a preferred embodiment, the labd is simultaneously incorporated during the 
amplification step in the preparation of the sample nucleic acids. Thus, for example, 
polymerase chain reaction (PCR) with labeled primers or labded nucleotides will 
provide a labeled amplification product. In a prefened embodiment, transcription 
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amplification, as described above, using a labeled nucleotide (e.g. fluorescein-labeled 
UTP and/or CTP) incorporates a label into the transcribed nucleic acids. 

Alternatively, a label may be added directly to the original nucleic acid 
sample (e.g., mRNA, polyA mRNA, cDNA, eic.) or to the amplification product after 
S the amplification is completed. Means of attaching labels to nucleic acids are well 
known to those of skill in the art and include, for example nick translation or end- 
labeling (e.g. with a labeled RNA) by kinasing of the nucleic acid and subsequent 
attachment (ligation) of a nucleic acid linker joining the sample nucleic acid to a label 
(e.g. , a fluorophore). 

10 Detectable labels suitable for use in the present invention include any 

composition detectable by spectroscopic, photochemical, biochemical, immunochemical, 
electrical, optical or chemical means. Useful labels in the present invention include 
biotin for staining with labeled streptavidin conjugate, magnetic beads (e.g. , 
Dynabeads™), fluorescent dyes (e.g., fluorescein, texas red, rhodamine, green 

15 fluorescent protein, and the like), radiolabels (e.g., ^H, ^^S, *^C, or ^^P), enzymes 
(e.g., horse radish peroxidase, alkaline phosphatase and others commonly used in an 
ELISA), and colorimetric labels such as colloidal gold or colored glass or plastic (e.g. , 
polystyrene, polypropylene, latex, etc.) beads. Patents teaching the use of such labels 
include U.S. Patent Nos. 3,817,837; 3,850,752; 3,939,350; 3,996,345; 4.277,437; 

20 4,275,149; and 4,366,241. 

Means of detecting such labels are well known to those of skill in the art. 
Thus, for example, radiolabels may be detected using photographic film or scintillation 
counters, fluorescent markers may be detected using a photodetector to detect emitted 
light. Enzymatic labels are typically detected by providing the enzyme with a substrate 

25 and detecting the reaction product produced by the action of the enzyme on the substrate, 
and colorimetric labels are detected by simply visualizing the colored 
label. 

The label may be added to the target (sample) nucleic acid(s) prior to, or 
after the hybridization. So called **direct labels" are detectable labels that are directiy 
30 attached to or incorporated into the target (sample) nucleic acid prior to hybridization. 

In contrast, so called ''indirect labels" are joined to the hybrid duplex after hybridization. 
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Often, the indirect label is attached to a binding moiety that has been attached to the 
target nucleic acid prior to the hybridization. Thus, for example, the target nucleic acid 
may be biotinylated before the hybridization. After hybridization, an aviden-conjugated 
fluorophore will bind the biotin bearing hybrid duplexes providing a label that is easily 
detected. For a detailed review of methods of labeling nucleic acids and detecting labeled 
hybridized nucleic acids see Labormry Techniques in Biochemistry and Molecular 
Biology, Vol. 24: Hybridizoiion With Nucleic Acid Probes, P. Tyssen ed Elsevier 
N.Y.,(1993)). ' ' 

Fluorescent labels are preferred and easily added during an in vitro 
transcription reaction. In a preferred embodiment, fluorescein labeled UTP and CTP are 
incorporated into the RNA produced in an in vitro transcription reaction as described 
above. 

O Mnrtifvinp samnlp to imnrovp riyn«i/««8,p ^^ ^ ^ 

The nucleic acid sample may be modified prior to hybridization to the 

high density probe array in order to reduce sample complexity thereby decreasing 
background signal and improving sensitivity of the measurement. In one embodiment, 
complexity reduction is achieved by selective degradation of background mRNA. This 
is accomplished by hybridizing tiie sample mRNA {e.g., polyA* RNA) with a pool of 
DNA oUgonucleotides that hybridize specifically with the regions to which the probes in 
the amy specifically hybridize. In a preferred embodiment, the pool of oligonucleotides 
consists of the same probe oUgonucleotides as found on the high density array. 

The pooi of oligonucleotides hybridizes to the sample mRNA forming a 
number of double stranded (hybrid duplex) nucleic acids. The hybridized sample is then 
treated with RNase A, a nuclease that specifically digests single stnuided RNA . The 
RNase A is Ujen inhibited, using a protease and/or commercially available RNase 
inhibitors, and the double stranded nucleic acids are then separated from the digested 
single stranded RNA. This separation may be accomplished in a number of ways well 
known to those of skill in tiie art including, but not limited to, electrophoresis, and 
gradient centrifugation. However, in a preferred embodiment, tiie pool of DNA 
oligonucleotides is provided attached to beads forming thereby a nucleic acid affinity 
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column. After digestion with the RNase A, the hybridized DNA is removed simply by 
denaturing (e.g., by adding heat or increasing salt) the hybrid duplexes and washing the 
previously hybridized mRNA off in an elution buffer. 

The undigested mRNA fragments which will be hybridized to the probes 
S in the high density array are then preferably end-labeled with a fluorophore attached to 
an RNA linker using an RNA ligase. This procedure produces a labeled sample RNA 
pool in which the nucleic acids that do not correspond to probes in the array are 
eliminated and thus unavailable to contribute to a background signal. 

Another method of reducing sample complexity involves hybridizing the 

10 mRNA with deoxyoligonucleotides that hybridize to regions that border on either size 
the regions to which the high density array probes are directed. Treatment with RNAse 
H selectively digests the double stranded (hybrid duplexes) leaving a pool of single- 
stranded mRNA corresponding to the short regions (e.g., 20 mer) that were formerly 
bounded by the deoxyoligonucleotide probes and which correspond to the targets of the 

15 high density array probes and longer mRNA sequences that correspond to regions 

between the targets of the probes of the high density array. The short RNA fragments 
are then separated from the long fragments (e.g., by electrophoresis), labeled if 
necessary as described above, and then are ready for hybridization with the high density 
probe array. 

20 In a third approach, sample complexity reduction involves the selective 

removal of particular (preselected) mRNA messages. In particular, highly expressed 
mRNA messages that are not specifically probed by the probes in the high density array 
are preferably removed. This s^roach involves hybridizing the polyA^ mRNA with an 
oligonucleotide probe that specifically hybridizes to the preselected message close to the 

25 3* ^ly A) end. The probe may be selected to provide high specificity and low cross 
reactivity. Treatment of the hybridized message/probe complex with RNase H digests 
the double stranded region effectively removing the polyA^ tail from the rest of the 
message. The sample is then treated with methods that specifically retain or amplify 
polyA^ RNA (e.g. , an oligo dT column or (dT)n magnetic beads). Such methods will 

30 not retain or amplify the selected message(s) as they are no longer associated with a 
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polyA* ,aa. These highly expressed messages are effectively removed from the sample 
providing a sample that has reduced background mRNA. 

IV. Hvhrirti7Btinn Army Pnirn 

A) Pt-nhprnmpnrifjpn, 

One of skill in the art will appreciate that an enormous number of anay 
designs are suitable for the pracUce of this invention. The high density army will 
typically include a number of probes that specifically hybridize to the nucleic acid(s) 
expression of Which is u, be detected. In addition, in a preferred embodiment, theanay 
will include one or more control probes. 

n TiKfprnhf^, 

In its simplest embodiment, the high density array includes -test probes- 
These are oligonucleotides that range from about 5 to about 45 or 5 to about 50 
nucleotides, more prof«ably from about 10 to about 40 nucleotides and most preferably 
ftom about 15 to about 40 nucleotides in length. In other particularly profaned 
embodiments the probes are 20 or 25 nucleotides in length. These oligonucleotide 
probes have sequences complementary to particular subsequences of the genes whose 
expression they are designed to detect. Thus, the test probes aro capable of specifically 
hybridizing to the target nucleic acid they are to detect. 

In addition to test probes that bind the target nucleic acid(s) of interest 
the high density array can contain a number of control probes. 11,6 control probes fell 
into three categories referred to herein as 1) Normalization controls; 2) Expression level 
controls; and 3) Mismatch controls. 

Normalization controls are oligonucleotide probes that are perfecUy 
complementary to labeled reference oligonucleotides that are added to the nucleic acid 
sample. The signals obtained from the normalization controls after hybridization 
provide a control for variations in hybridization conditions, label intensity, -reading- 
efficiency and other factors that may cause the signal of a perfect hybridization to vary 
between arrays. In a preferred embodiment, signals (e.g., fluorescence intensity) read 
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from all other probes in the array are divided by the signal (e.g. , fluorescence intensity) 
from the control probes thereby normalizing the measurements. 

Virtually any probe may serve as a normalization control. However, it is 
recognized that hybridization efficiency varies with base composition and probe length. 
5 Preferred normalization probes are selected to reflect the average length of the other 
probes present in the array, however, they can be selected to cover a range of lengths. 
The normalization control(s) can also be selected to reflect the (average) base 
composition of the other probes in the array, however in a preferred embodiment, only 
one or a few normalization probes are used and they are selected such that they hybridize 
10 well (i.e. no secondary structure) and do not match any target-specific probes. 

Normalization probes can be localized at any position in the array or at 
multiple positions throughout the array to control for spatial variation in hybridization 
efficiently. In a preferred embodiment, the normalization controls are located at the 
comers or edges of the array as well as in the middle. 

15 

3) Expressio n level controk. 

Expression level controls are probes that hybridize specifically with 
constitutively expressed genes in the biological sample. Expression level controls are 
designed to control for the overall health and metabolic activity of a cell. Examination 

20 of the covariance of an expression level control with the expression level of the target 
nucleic acid indicates whether measured changes or variations in expression level of a 
gene is due to changes in transcription rate of that gene or to general variations in health 
of the cell. Thus, for example, when a cell is in poor health or lacking a critical 
metabolite the expression levels of both an active target gene and a constitutively 

25 expressed gene are expected to decrease. The converse is also true. Thus where the 
expression levels of both an expression level control and the target gene appear to both 
decrease or to boUi increase, the change may be attributed to changes in the metabolic 
activity of die cell as a whole, not to differential expression of the target gene in 
question. Conversely, where the expression levels of the target gene and the expression 

30 level control do not covary, the variation in die expression level of die target gene is 
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^ Sample prf>pflrfltinn/amplificfltiQn controls. 

The high density array may also include sample preparation/amplification 
control probes. These are probes that are complementary to subsequences of control 
genes selected because they do not normally occur in the nucleic acids of the particular 
S biological sample being assayed. Suitable sample preparation/amplification control 

probes include, for example, probes to bacterial genes (e.^., Bio B) where the sample in 
question is a biological fi-oni a eukaryote. 

The RNA sample is then spiked with a known amount of the nucleic acid 
to which the sample preparation/amplification control probe is directed before 
10 processing. Quantification of the hybridization of the sample preparation/amplification 
control probe then provides a measure of alteration in the abundance of the nucleic acids 
caused by processing steps (e.g. PGR, reverse transcription, in vitro transcription, etc.), 

m Prohe Selggtion and OptimiTiition. 

IS In a preferred embodiment, oligonucleotide probes in the high density 

array are selected to bind specifically to the nucleic acid target to which they are directed 
with minimal non-specific binding or cross-hybridization under the particular 
hybridization conditions utilized. Because the high density arrays of this invention can 
contain in excess of 1,000,000 different probes, it is possible to provide every probe of a 

20 characteristic length that binds to a particular nucleic acid sequence. Thus, for example, 
the high density array can contain every possible 20 mer sequence complementary to an 
IL-2mRNA. 

There, however, may exist 20 mer subsequences that are not unique to the 
IL-2 mRNA. Probes directed to these subsequences are expected to cross hybridize with 

25 occurrences of their complementary sequence in other regions of the sample genome. 
Similarly, other probes simply may not hybridize effectively under the hybridization 
conditions (e.g., due to secondary structure, or interactions with the substrate or other 
probes). Thus, in a preferred embodiment, the probes that show such poor specificity or 
hybridization efficiency are identified and may not be included either in the high density 

30 array itself (e.g. , during fabrication of the array) or in the post-hybridization data 
analysis. 



wo 97/10365 

PCT/US96/I4839 

38 

hundred base pairs long. For .ost applications it would be useful to identify the 
--^--.orexp^^sionlevelofseveralthousandto^ 
gene. Because the nu.ber of oligonucleotides per anay is ,i„,i.ed in a p«fe«d 

e.bod,n,entJt is desired to include only a limited set of p^bes spec^^^^ 
wnoseapiessionislobedaected. 

I. is a discovoy fti, i„^^ ^ ^ ^ ^ 

"P»mu. Of ^ ,„ ^ ^ s,„„«i^ ,^ ,,,, 

1 ) HYhiMiratim. r , .n Tr Hyi-Mi-t i nn ir n ii i 
Him, ill one embodiment, this invention provides for a melhod of 
"8«J«.be,e.forde,e=d,nofap.rtic>*.gene. Generaily, this method 
mvolves^teg a high density array containing a mulUplidty of ptobes of o, 
pantcto leagues) that are complementa., to snbsequences of U,e mRNA 

'^Jy O' -a^ »=ne. In one embod,n»„ th. high density a„ay may contain 
»«n'prol»=ofap,,,icul,,V^to.i,,„p^^^,„^^_^^l^^^_^^ ^ 

^ Of .i» high d«,si„ array are then hybddiz^l with .heir target „,.,eic acid ^ 
»d the. hybridize, with a high complexity, high concentraUon nucleic acid sample ,ha, 
do« n« contain the targets complem«„a„ to the probes. Thus, for e«,mple, when= 



one or 



»d is an RNA, the probes are first hybridized with target nucleic 
•^Irt^ and then hybridia«i with RNA made ftom a CDNA library ,eg 
«n«Hb«l polyA- mRNA) «here the sense of the hybridized RNA is opp.»i,e that of 
orget m^leic add (,. insure that the high complexity sample does not contain targets 
*= ■«*«). Those probes that Show a str»,g hybridization signal with their target 

-ul ltttleor no cros^hybridizadon with the high complexity samplearepreferred probes 
fcr use m the high .tensity arrays of this invention. 
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The high density array may additionally contain mismatch controls for 
each of the probes to be tested. In a preferred embodiment, the mismatch controls 
contain a central mismatch. Where both the mismatch control and the target probe show 
high levels of hybridization (e.g. , the hybridization to the mismatch is nearly equal to or 
S greater than the hybridization to the corresponding test probe), the test probe is 
preferably not used in the high density array. 

In a particularly pieferred embodiment, optimal probes are selected 
according to the following method: First, as indicated above, an array is provided 
containing a multiplicity of oligonucleotide probes complementary to subsequences of 
10 the target nucleic acid. The oligonucleotide probes may be of a single length or may 
span a variety of lengths ranging from 5 to SO nucleotides. The high density array may 
contain every probe of a particular length that is complementary to a particular mRNA 
or may contain probes selected from various regions of particular mRNAs. For each 
target-specific probe the array also contains a mismatch control probe; preferably a 
IS central mismatch control probe. 

The oligonucleotide array is hybridized to a sample containing target 
nucleic acids having subsequences complementary to the oligonucleotide probes and the 
difference in hybridization intensity between each probe and its mismatch control is 
determined. Only those probes where the difference between the probe and its mismatch 
20 control exceeds a threshold hybridization intensity (e.g. preferably greater than 10% of 
the background signal intensity, more preferably greater than 20% of the background 
signal intensity and most preferably greater than S0% of the background signal intensity) 
are selected. Thus, only probes that show a strong signal compared to their mismatch 
control are selected. 

2S The probe optimization procedure can optionally include a second round 

of selection. In this selection, the oligonucleotide probe array is hybridized with a 
nucleic acid sample that is not expected to contain sequences complementary to the 
probes. Thus, for example, where the probes are complementary to the RNA sense 
strand a sample of antisense RNA is provided. Of course, other samples could be 

30 provided such as samples from organisms or cell lines known to be lacking a particular 
gene, or known for not expressing a particular gene. 



0 
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21 H....4«||. 

Using u,e hybridiatton ^ CTOs-fcytridiaUon d«. obained M <l«cHb«, 
probe prope^es ».b=r of As, .^ber „, cs i„ , ^ 
prop^es ««, bjbridiz^i™ or c™s-h,bnd,aS» i„«,si,i« J^T 

P^"dU»re«„.™.p^„^^. ^^„^^^^^^^^^ 

Hybridization rules: 

1) Number of As is less than 9. 

2) Number of Ts is less than 10 and greater than 0. 

3) Maximum run of As. Gs, or Ts is less than 4 bases in a row 

4) Maximum run of any 2 bases is less than 1 1 bases. 

5) Palindrome score is less than 6. 



wo 97/1 0365 PCT/US96/1 4839 

41 

6) Clumping score is less than 6. 

7) Number of As + Number of Ts is less than 14 

8) Number of As + number of Gs is less than 15 

With respect to rule number 4, requiring the maximum run of any two bases to be less 
than 11 bases guarantees that at least three different bases occur within any 12 
consecutive nucleotides. A palindrome score is the maximum number of complementary 
bases if the oligonucleotide is folded over at a point that maximizes self 
complementarity. Thus, for example a 20 mer that is perfectly self-complementary 
would have a palindrome score of 10. A clumping score is the maximum number of 
three-mers of identical bases in a given sequence. Thus, for example, a run of S 
identical bases will produce a clumping score of 3 (bases 1-3, bases 2-4, and bases 3-S). 

If any probe failed one of these criteria (1-8), the probe was not a 
member of the subset of probes placed on the chip. For example, if a hypothetical probe 
was 5'-AGCrTTTTTCATGCATCTAT-3' the probe would not be synthesized on the 
chip because it has a run of four or more bases (i.e. , run of six). 

The cross hybridization rules developed for 20 mers were as follows: 

1) Number of Cs is less than 8; 

2) Number of Cs in any window of 8 bases is less than 4. 

Thus, if any probe failed any of either the hybridization ruses (1-8) or the 
cross-hybridization rules (1-2), the probe was not a member of the subset of probes 
placed on the chip. These rules eliminated many of the probes that cross hybridized 
strongly or exhibited low hybridization, and performed moderate job of eliminating 
weakly hybridizing probes. 

These heuristic rules may be implemented by hand calculations, or 
alternatively, they may be implemented in software as is discussed below in Section 
IV.B.7. 

3) Neural net. 

In another embodiment, a neural net can be trained to predict the 
hybridization and cross-hybridization intensities based on the sequence of the probe or 
on other probe properties. The neural net can then be used to pick an arbitrary number 



wo 97/10365 

PCT/US96/14839 

42 

41 Awnv^ IVInilcl 

An analysis of ™u«e <ANOVA) n»del may be built tt, mod.l , I, 
.nttns,te based on posidons of c»«c«i« base pain J, . , !! " *' 

----c^.-as^onsuc^^^^C^:;^"--'^- 

nb«^.beap,ob.s.^^-^trr 
an-c^s-bybrtdizasonln^^iae. T^>n^^„Z^Z^T'^'°° 
consecutive base Dairs nn^ , *«I"«»«s broken down into 

20 Of .he possible'" ' ""^ ■"-^•y " eacb 

o^*^^po.s.ble.np«s,s.„csd.^^^„,^^^ ' 

"ddsthemtensidesforeachofifslScomponenu. ' 

, '°°^*"<'°'»™ Thus.in.pref»«l«,bo*„e„, i, is desiraW. , 

^ The .to,l«, pining „«b^ ^ „ „„, , ^ 

»-^-»o,»«^b.in,™,.i,o«d.,he„e,^,b.™^,^^^^^^^^^ ^7 
«»*«v.n,p,^„^e,ofa.o.^«,d,..,„f„„^J^^„„^^°' 



wo 97/10365 



PCT/US96/14839 



43 



probe from gene 1 : aagcgcgatcgattatgctc 



gene 2: 



atctcggatcgatcggataagcgcgatcgattatgctcggcga 



has 8 matching bases in this alignment, but 20 matching bases in the following 
alignment: 



More complicated algorithms also exist, which allow the detection of insertion or 
deletion mismatches. Such sequence alignment algorithms are well known to those of 
skill in the art and include, but are not limited to BLAST, or FASTA, or other gene 
matching programs such as those described above in the defmitions section. 



very similar, it is difficult to make a probe set that measures the concentration only one 
of those very similar genes. One can then prune out any probes which are dissimilar, 
and make the probe set a probe set for that family of genes. 



the number of synthesis cycles. In a normal set of genes the distribution of the number of 
cycles any probe takes to build approximates a Gausian distribution. Because of this the 
mask cost can normally be reduced by IS% by throwing out about 3 percent of the probes. 
In a preferred embodiment, synthesis cycle pruning simply involves eliminating (not 
including) those probes those probes that require a greater number of synthesis cycles than 
the maximum number of synthesis cycles selected for preparation of the particular subject 
high density oligonucleotide array. Since the typical synthesis of probes follows a regular 
pattern of bases put down (acgtacgtacgt. ,) counting the number of synthesis steps needed 
to build a probe is easy. The listing shown in Table 1 povides typical code for counting the 
number of synthesis cycles a probe will need. 



probe from gene 1 : 




gene 2: 



In another variant, where an organism has many different genes which are 



<n Synthesis cycie pruning- 

The cost of producing masks for a chip is approximately lineariy related to 
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I. Ty^ cod. fo, coun^, ^ ^ ^ ^ 



^ sutic char basefj = "acgt"; " 

'^■^■•^'(oX Vo. o/oiVi 0. ii 1 0. 1 1 0. o.'„:3:„:„, v„,v „, 

short IoolcupIndex( char aBase ){ 

'*^«»PPer( aBase) II !isalpha( aBase))/ 
errorHwnd( "illegal base"); 
return -I; 

) 

ii( strchr( base, aBase ) = NULL) { 

IS ^'TOrHwndC "non-dna base") 

" return 0; 

} 

return indexf aBase - 'a']; 



) 



charbufFerl[40]; ' 
for( I =3D 0; bufferp] != 0; 1++ ){ 
25 switch(toIower(buffernj)){ 
caseV: bufferlfi] = V;break; 
caseV: bufrerl[i] = 'g';break; 
case'g': bufferlfi] = 'c'ibrcak; 
caser. bufferlfi] = 'a';break; 

bufferlfi] = 0; 

if(bufferlfO] = 0) return 0; 
last = lookupIndex( buffer I fO] )• 
3^ for( i = 1; bufferlfi] != 0; i++ ){' 

current = lookupIndex( bufferlfi] ) 

if( current <= last )cycles++; ' . 

last = current: 

) 

^ ^ /«"m(short)((cydes-l)*4 + current+I); 



7)Comhiniirii.n «f i n f fh ffll,l , 

'^''^ ™'"'™^"'"««ndannova model provide wavs of DHinm^ 
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do not necessarily produce the same results, or produce entirely independent results, it may 
be advantageous to combine the methods. For example, probes may be pruned or reduced 
if more than one method (e.g., two out of three) indicate the probe will not likely produce 
good results. Then, synthesis cycle pruning may be performed to reduce costs. 
S Fig. 1 1 shows the flow of a process of increasing the number of probes for 

monitoring the expression of genes after the number of probes has been reduced or pruned. 
In one embodiment, a user is able to specify the number of nucleic acid probes that should 
be placed on the chip to monitor the expression of each gene. As discussed above, it is 
advantageous to reduce probes that will not likely produce good results; however, the 

10 number of probes may be reduced to substantially less than the desired number of probes. 

At step 402, the number of probes for monitoring multiple genes is reduced 
by the heuristic rules method, neural net, annova model, synthesis cycle pruning, or any 
other method, or combination of methods. A gene is selected at step 404. 

A determination is made whether the remaining probes for monitoring the 

15 selected gene number greater than 80% (which may be varied or user defined) of the 

deared number of probes. If yes, the computer system proceeds to the next gene at step 
408 which will generally return to step 404. 

If the remaining probes for monitoring the selected gene do not number 
greater than 80% of the desired number of probes, a detennination is made whether the 

20 remaining probes for monitoring the selected gene number greater than 40% (which may be 
varied or user defined) of the desired number of probes. If yes, an "i" is appended to the 
end of the gene name to indicate that after pruning, the probes were incomplete at step 412. 

At step 414, the number of probes is increased by loosening the constraints 
that rejected probes: For example, the thresholds in the heuristic rules may be increased by 

25 1 . Therefore, if previously probes were rejected if they had four As in a row, the rule may 
be loosened to five As in a row. 

A determination is then made whether the remaining probes for monitoring 
the selected gene number greater than 80% of the desired number of probes at step 416. If 
yes, an "r" is appended to the end of the gene name at step 412 to indicate that the rules 

30 were loosened to generate the number of synthesized probes for that gene. 
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simultaneous coupling at a number of reaction sites, into a different heterogenous array. 
See, U.S. Application Serial Nos. 07/796,243 and 07/980,523. 

The development of VLSIPS™ technology as described in the above-noted 
U.S. Patent No. 5,143,854 and PCT patent publication Nos. WO 90/15070 and 
5 92/10092, is considered pioneering technology in the fields of combinatorial synthesis 
and screening of combinatorial libraries. More recently, patent application Serial No. 
08/082,937, filed June 25, 1993 describes methods for making arrays of oligonucleotide 
probes that can be used to check or determine a partial or complete sequence of a target 
nucleic acid and to detect the presence of a nucleic acid containing a specific 

10 oligonucleotide sequence. 

In brief, the light-directed combinatorial synthesis of oligonucleotide 
arrays on a glass surface proceeds using automated phosphoramidite chemistry and chip 
masking techniques. In one specific implementation, a glass surface is derivatized with 
a silane reagent containing a functional group, e.g. , a hydroxyl or amine group blocked 

IS by a photolabile protecting group. Photolysis through a photolithogaphic mask is used 
selectively to expose functional groups which are then ready to react with incoming 
5'-photoprotected nucleoside phosphoramidites. The phosphoramidites react only with 
those sites which are illuminated (and thus exposed by removal of the photolabile 
blocking group). Thus, the phosphoramidites only add to those areas selectively exposed 

20 from the preceding step. These steps are repeated until the desired array of sequences 
have been synthesized on the solid surface. Combinatorial synthesis of different 
oligonucleotide analogues at different locations on the array is determined by the pattern 
of illumination during synthesis and the order of addition of coupling reagents. 

In the event that an oligonucleotide analogue with a polyamide backbone 

25 is used in the VLSIPS^ procedure, it is generally inappropriate to use phosphoramidite 
chemistry to perform the synthetic steps, since the monomers do not attach to one 
another via a phosphate linkage. Instead, peptide synthetic methods are substituted. 
See, e.g,. Piming et al, U.S. Pat. No. 5,143,854, 

Peptide nucleic acids are commercially available from, e.g., Biosearch, 

30 Inc. (Bedford, MA) which comprise a polyamide backbone and the bases found in 

naturally occurring nucleosides. Peptide nucleic acids are capable of binding to nucleic 
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the monomer B is flowed through or placed in the second flow channel(s), binding 
monomer B at the second selected locations. In this particular example, the resulting 
sequences bound to the substrate at this stage of processing will be, for example, A, B, 
and AB. The process is repeated to form a vast array of sequences of desired length at 
known locations on the substrate. 

After the substrate is activated, monomer A can be flowed through some 
of the channels, monomer B can be flowed through other channels, a monomer C can be 
flowed through still other channels, etc. In this manner, many or all of the reaction 
regions are reacted with a monomer before the channel block must be moved or the 
substrate must be washed and/or reactivated. By making use of many or all of the 
available reaction regions simultaneously, the number of washing and activation steps 
can be minimized. 

One of skill in the art will recognize that there are alternative methods of 
forming channels or otherwise protecting a portion of the surface of the substrate. For 
example, according to some embodiments, a protective coating such as a hydrophilic or 
hydrophobic coating (depending upon the nature of the solvent) is utilized over portions 
of the substrate to be protected, sometimes in combination with materials that facilitate 
wetting by the reactani solution in other regions. In this manner, the flowing solutions 
are further prevented from passing outside of their designated flow paths. 

The "spotting" methods of preparing compounds and libraries of the 
present invention can be implemented in much the same manner as the flow channel 
methods. For example, a monomer A can be delivered to and coupled with a first group 
of reaction regions which have been appropriately activated. Thereafter, a monomer B 
can be delivered to and reacted with a second group of activated reaction regions. 
Unlike the flow channel embodiments described above, reactants are delivered by 
directly depositing (rather than flowing) relatively small quantities of them in selected 
regions* In some steps, of course, the entire substrate surface can be sprayed or 
otherwise coated with a solution. In preferred embodiments, a dispenser moves from 
region to region, depositing only as much monomer as necessary at each stop. Typical 
dispensers include a micropipette to deliver the monomer solution to the substrate and a 
robotic system to control the position of the micropipette with respect to the substrate. 
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signal intensity greater than approximately 10% of the background intensity. Thus, in a 
preferred embodiment, the hybridized array may be washed at successively higher 
stringency solutions and read between each wash. Analysis of the data sets thus 
produced will reveal a wash stringency above which the hybridization pattern is not 
5 appreciably altered and which provides adequate signal for the particular oligonucleotide 
probes of interest. 

In a preferred embodiment, background signal is reduced by the use of a 
detergent (e,g., C-TAB) or a blocking reagent (e.g., sperm DNA, cot-1 DNA, etc.) 
during the hybridization to reduce non-specific binding. In a particularly preferred 

10 embodiment, the hybridization is performed in the presence of about O.S mg/ml DNA 
(e.g., herring sperm DNA). The use of blocking agents in hybridization is well known 
to those of skill in the art (see, e.g. , Chapter 8 in P. Tijssen, supra.) 

The stability of duplexes formed between RNAs or DNAs are generally in 
the order of RNA:RNA > RNA:DNA > DNArDNA, in solution. Long probes have 

IS better duplex stability with a target, but poorer mismatch discrimination than shorter 
probes (mismatch discrimination refers to the measured hybridization signal ratio 
between a perfect match probe and a single base mismatch probe). Shorter probes (e.g., 
8-mers) discriminate mismatches very well, but the overall duplex stability is low. 

Altering the thermal stability (TJ of the duplex formed between the target 

20 and the probe using, e.g., known oligonucleotide analogues allows for optimization of 
duplex stability and mismatch discrimination. One useful aspect of altering the T„ arises 
from the fact that adenine-thymine (A-T) duplexes have a lower T„ than guanine- 
cytosine (G-C) duplexes, due in part to the fact that the A-T duplexes have 2 hydrogen 
bonds per base-pair, while the G-C duplexes have 3 hydrogen bonds per base pair. In 

25 heterogeneous oligonucleotide arrays in which there is a non-uniform distribution of 
bases, it is not generally possible to optimize hybridization for each oligonucleotide 
probe simultaneously. Thus, in some embodiments, it is desirable to selectively 
destabilize G-C duplexes and/or to increase the stability of A-T duplexes. This can be 
accomplished, e.g., by substituting guanine residues in the probes of an array which 

30 form G-C duplexes with hypoxanthine, or by substituting adenine residues in probes 
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which fom. A-T duplexes with 2,6 diaminopurine or by using the salt tctramethyl 
ammonium chloride (TMACI) in place of NaCl. 

Altered duplex stability conferred by using oligonucleoUde analogue 
probes can be ascertained by following, e.g., fluorescence signal intensity of 
oligonucleotide analogue arrays hybridized with a target oligonucleotide over time. The 
data allow optimization of specific hybridization conditions at. e.g., room tempenuure 
(for simplified diagnostic applications in the future). 

Another way of verifying altered duplex stability is by following the 
signal intensity generated upon hybridization with time. Previous experiments using 
DNA targets and DNA chips have shown that signal intensity increase with time, and 
that the more stable duplexes generate higher signal intensities faster than less stable 
duplexes. The signals reach a plateau or "saturate" after a certain amoum of time due to 
all of the binding sites becoming occupied. These data allow for optimization of 
hybridization, and determination of the best conditions at a specified tempeiaturc. 

Methods of optimizing hybridization conditions are well known to those 
of skill in the art (see. e.g. . Laboratory Techniques in Biochemistry and Molecular 
Biology, Vol. 24: Hybridization With Nucleic Add Probes, P. Tijssen, ed. Hsevier, 
N.Y.,(1993)). 

Vn. .<;tgna^|>p|^^j,,„ 

Means of detecting labeled target (sample) nucleic acids hybridized to the 
probes of the high density array are known to tiiose of skill in the art. Thus, for 
example, where a colorimetric label is used, simple visualization of the label is 
sufHcient. Where a radioactive labeled probe is used, detection of the radiation {e.g 
wiUi photographic film or a solid state detector) is sufficient. 

In a preferred embodiment, however, the target nucleic acids are labeled 
witii a fluonacent kibd and the localization of the label on the probe array is 
accomplished with fluorescent microscopy. The hybridized array is excited with a light 
source at die excitation wavelength of tiie particular fluorescent label and the resulting 
Huorescence at the emission wavelengOi is detected. In a particularly preferred 
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embodiment, the excitation light source is a laser appropriate for the excitation of the 
fluorescent label. 

The confocal microscope may be automated with a computer-controUed 
stage to automatically scan the entire high density array. Similarly, the microscope may 
5 be equipped with a phototransducer (e.g.t a photomultiplier, a solid state array » a ccd 
camera, eic.) attached to an automated data acquisition system to automatically record 
the fluorescence signal produced by hybridization to each oligonucleotide probe on the 
array. Such automated systems are described at length in U.S. Patent No: S,143,8S4, 
PCT Application 20 92/10092, and copending U.S.S.N. 08/195,889 filed on February 
10 10, 1994. Use of laser illumination in conjunction with automated confocal microscopy 
for signal detection permits detection at a resolution of better than about 100 ^ni, more 
preferably better than about SO ^m, and most preferably better than about 25 fitn. 

YIII, Signal Evaluation. 

IS One of skill in the art will appreciate that methods for evaluating the 

hybridization results vary with the nature of the specific probe nucleic acids used as well 
as the controls provided. In the simplest embodiment, simple quantification of the 
fluorescence intensity for each probe is determined. This is accomplished simply by 
measuring probe signal strength at each location (representing a different probe) on the 

20 high density array {e.g., where the label is a fluorescent label, detection of the amount 
of florescence (intensity) produced by a fixed excitation illumination at each location on 
the array). Comparison of the absolute intensities of an array hybridized to nucleic acids 
from a "test** sample with intensities produced by a "control" sample provides a measure 
of the relative expression of the nucleic acids that hybridize to each of the probes. 

25 One of skill in the art, however, will appreciate that hybridization signals 

will vary in strength with efficiency of hybridization, the amount of label on the sample 
nucldc acid and the amount of the particular nucleic acid in the sample. Typically 
nucleic acids present at very low levels {e.g. , < IpM) will show a very weak signal. At 
some low level of concentration, the signal becomes virtually indistinguishable from 

30 background. In evaluating the hybridization data, a threshold intensity value may be 
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selected below which a signal is not counted as being essentially indistinguishable from 
background. 

Where it is desirable to detect nucleic acids expressed at lower levels, a 
lower threshold is chosen. Conversely, where only high expression levels are to be 
evaluated a higher threshold level is selected. In a preferred embodiment, a suitable 
threshold is about 10% above that of the average background signal. 

In addition, the provision of ^propriate controls permits a more detailed 
analysis that controls for variations in hybridization conditions, cell health, non-specific 
binding and the like. Thus, for example, in a preferred embodiment, the hybridization 
array is provided with normalization controls as described above in Section IV.A.2. 
These normalization controls are probes complementary to control sequences added in a 
known concentration to the sample. Where the overall hybridization conditions are 
poor, the normalization controls will show a smaller signal reflecting reduced 
hybridization. Conversely, where hybridization conditions arc good, the normalization 
controls will provide a higher signal reflecting the improved hybridization. 
Normalization of the signal derived from other probes in the array to the normalization 
controls thus provides a control for variations in hybridization conditions. Typically, 
normalization is accomplished by dividing the measured signal from the other probes in 
the anay by the average signal produced by the normalization controls. Normalization 
may also include correction for variations due to sample preparation and amplification. 
Such normalization may be accomplished by dividing the measured signal by the average 
signal from the sample preparation/amplfication control probes (e.g., the Bio B probes). 
The resulting values may be multiplied by a constant value to scale the results. 

As indicated above, tiie high density array can include mismatch controls. 
In a preferred embodiment, there is a mismatch control having a central mismatch for 
every probe (except tiie normalization controls) in the array. It is expected that after 
washing in stringent conditions, where a perfect match would be expected to hybridize to 
Uie probe, but not to the mismatch, the signal from the mismatch controls should only 
reflect non-specific binding or the presence in the sample of a nucleic acid Uiat 
hybridizes with the mismatch. Where both the probe in question and its corresponding 
mismatch conti-ol both show high signals, or the mismatch shows a higher signal than its 
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corresponding test probe, there is a problem with the hybridization and the signal from 
those probes is ignored. The difference in hybridization signal intensity between the 
target specific probe and its corresponding mismatch control is a measure of the 
discrimination of the target-specific probe. Thus, in a preferred embodiment, the signal 
of the mismatch probe is subtracted from the signal from its corresponding test probe to 
provide a measure of the signal due to specific binding of the test probe. 

The concentration of a particular sequence can then be determined by 
measuring the signal intensity of each of the probes that bind specifically to that gene 
and normalizing to the normalization controls. Where the signal from the probes is 
greater than the mismatch, die mismatch is subtracted. Where the mismatch intensity is 
equal to or greater than its corresponding test probe, the signal is ignored. The 
expression level of a particular gene can then be scored by the number of positive signals 
(either absolute or above a threshold value), the intensity of the positive signals (either 
absolute or above a selected threshold value), or a combination of both metrics (e.g. , a 
weighted average). 

It is a surprising discovery of this invention, that normalization controls 
are often unnecessary for useful quantification of a hybridization signal. Thus, where 
optimal probes have been identified in the two step selection process as described above, 
in Section 11. B., the average hybridization signal produced by the selected optimal 
probes provides a good quantified measure of the concentration of hybridized nucleic 
acid. 

ComnuterHmplempnted Kxprewmn Monitoring 

The methods of monitoring gene expression of this invention may be 
performed utilizing a computer. The computer typically runs a software program that 
includes computer code incorporating the invention for analyzing hybridization 
intensities measured from a substrate or chip and thus, monitoring the expression of one 
or more genes. Although the following will describe specific embodiments of the 
invention, the invention is not limited to any one embodiment so the following is for 
purposes of illustration and not limitation. 
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Fig. 6 illustrates an example of a computer system used to execute the 
software of an embodiment of the present invention. As shown, shows a computer 
system 100 includes a monitor 102. screen 104, cabinet 106. keyboard 108. and mouse 
110. Mouse 110 may have one or more buttons such as mouse buttons 112. Cabinet 
106 houses a CD-ROM drive 1 14. a system memory and a hard drive (both shown in 
Fig. 7) which may be utilized to store and retrieve software programs incorpoiating 
computer code that implements the invention, data for use with the invention, and the 
like. Although a CD-ROM 1 16 is shown as an exemplary computer readable storage 
medium, other computer readable storage media including floppy disks, tape, flash 
memory, system memory, and hard drives may be utilized. Cabinet 106 also houses 
familiar computer components (not shown) such as a central processor, system memory, 
hard disk, and the like. 

Fig. 7 shows a system block diagram of computer system 100 used to 
execute the software of an embodiment of the present invention. As in Fig. 6, computer 
system 100 includes monitor 102 and keyboard 108. Computer system 100 further 
includes subsystems such as a central processor 120. system memory 122. I/O controller 
124. display adapter 126. removable disk 128 (e.g. . CD-ROM drive), fixed disk 130 
(e.g., hard drive), network interface 132. and speaker 134. Other computer systems 
suitable for use with die present invention may include additional or fewer subsystems. 
For example, another computer system could include more than one processor 120 (i.e. , 
a multi-processor system) or a cache memory. 

Arrows such as 136 represent the system bus architecture of computer 
system 100. However, these arrows are illustrative of any interconnection scheme 
serving to link the subsystems. For example, a local bus could be utilized to connect 
the central processor to the system memory and display adapter. Computer system 100 
shown in Hg. 7 is but an example of a computer system suitable for use with the present 
invention. Other configurations of subsystems suitable for use with the present invention 
will be readily apparent to one of ordinary skiU in the art. 

Fig. 8 shows a flowchart of a process of monitoring the expression of a 
gene. The process compares hybridization intensities of pairs of perfect match and 
mismatch probes that are preferably covalenUy attached to the surface of a substrate or 
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chip. Most preferably, the nucleic acid probes have a density greater than about 60 
different nucleic acid probes per 1 cm^ of the substrate. Although the flowcharts show a 
sequence of steps for claiity, this is not an indication that the steps must be performed in 
this specific order. One of ordinary skill in the art would readily recognize that many of 
the steps may be reordered, combined, and deleted without departing from the invention. 

Initially, niicleic acid probes are selected that are complementary to the 
target sequence (or gene). These probes are the perfect match probes. Another set of 
probes is specified that are intended to be not perfectly complementary to the target 
sequence. These probes are the mismatch probes and each mismatch probe includes at 
least one nucleotide mismatch from a perfect match probe. Accordingly, a mismatch 
probe and the perfect match probe from which it was derived make up a pair of probes. 
As mentioned earlier, the nucleotide mismatch is preferably near the center of the 
mismatch probe. 

The probe lengths of the perfect match probes are typically chosen to 
exhibit high hybridization affinity with the target sequence. For example, the nucleic 
acid probes may be all 20-mers. However, probes of varying lengths may also be 
synthesized on the substrate for any number of reasons including resolving ambiguities. 

The target sequence is typically fragmented, labeled and exposed to a 
substrate including the nucleic acid probes as described earlier. The hybridization 
intensities of the nucleic acid probes is then measured and input into a computer system. 
The computer system may be the same system that directs the substrate hybridization or 
it may be a different system altogether. Of course, any computer system for use with 
the invention should have available other details of the experiment including possibly the 
gene name, gene sequence, probe sequences, probe locations on the substrate, and the 
Uke. 

Referring to Fig. 8, after hybridization, the computer system receives 
input of hybridization intensities of the multiple pairs of perfect match and mismatch 
probes at step 202. The hybridization intensities indicate hybridization affinity between 
the nucleic acid probes and the target nucleic acid (which corresponds to a gene). Each 
pair includes a perfect match probe that is perfectly complementary to a portion of the 
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urget nucleic acid and a mismatch probe that differs from the perfect match probe by at 
least one nucleotide. 

At step 204. the computer system compares the hybridization intensities of 
the perfect match and mismatch probes of each pair. If the gene is expressed, the 
hybridization intensity (or affinity) of a perfect match probe of a pair should be 
recognizably higher than the corresponding mismatch probe. Generally, if the 
hybridizations intensities of a pair of probes are substantially the same, it may indicate 
the gene is not expressed. However, the determination is not based on a single pair of 
probes, the determination of whether a gene is expressed is based on an analysis of many 
pairs of probes. An exemplary process of comparing the hybridization intensities of the 
pairs of probes will be described in more detail in reference to Fig. 9. 

After tiie system compares ti«e hybridization intensity of tiie perfect match 
and mismatch probes, the system indicates expression of the gene at step 206. As an 
example. ti,e system may indicate to a user tiiat tiie gene is either present (expressed), 
marginal or absent (unexpressed). 

Fig. 9 shows a flowchart of a process of determining if a gene is 
expressed utilizing a decision matrix. At step 252. the computer system receives raw 
scan data of N pairs of perfect match and mismatch probes. In a preferred embodiment. 
Oie hybridization intensities are photon counts from a fluorescein labeled target that has 
hybridized to U,e probes on the substrate. For simplicity, the hybridization intensity of a 
perfect match probe will be designed "I^- and tht hybridization intensity of a mismatch 
probe will be designed " 

Hybridisation intensities for a pair of probes is reuieved at step 254. The 
background signal intensity is subtracted from each of the hybridization intensities of tht 
pair at step 256. Background subtraction may also be performed on all the raw scan data 
at the same time. 

At step 258, tiie hybridization intensities of the pair of probes are 
compared to a difference threshold (D) and a ratio threshold (R). It is determined if tht 
difference between tht hybridization intensities of the pair (I^ - is greater tiian or 
equal to ti,e difference tiireshold AND the quotient of the hybridization intensities of tiie 
pair (I^ I \^ is greater tiian or equal to tiie ratio tiireshold. The difference thresholds 
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are typically user defined values that have been determined to produce accurate 
expression monitoring of a gene or genes. In one embodiment, the difference threshold 
is 20 and the ratio threshold is L2. 

If Ipm ■ Imm > = D and Ip„j / la^ > = R, the value NPOS is incremented at 
5 step 260. In general, NPOS is a value that indicates the number of pairs of probes 

which have hybridization intensities indicating that the gene is likely expressed. NPOS 
is utilized in a determinadon of the expression of the gene. 

At step 262, it is determined if I„ - Ip„ > = D and I„ / Ip« > = R. If 
this expression is true, the value NNEG is incremented at step 264. In general, NNEG 
10 is a value that indicates the number of pairs of probes which have hybridization 
intenaties indicating that the gene is likely not expressed. NNEG, like NPOS, is 
utilized in a determination of the expression of the gene. 

For each pair that exhibits hybridization intensities either indicating the 
gene is expressed or not expressed, a log ratio value (LR) and intensity difference value 
IS (IDIF) are calculated at step 266. LR is calculated by the log of the quotient of the 

hybridization intensities of the pair (l^ I The IDIF is calculated by the difference 
between the hybridization intensities of the pair (Ip^ - 1^). If there is a next pair of 
hybridization intensities at step 268, they are retrieved at step 254. 

At step 272, a decision matrix is utilized to indicate if the gene is 
20 expressed. The decision matrix utilizes the values N, NPOS, NNEG, and LR (multiple 
LRs). The following four assignments are performed: 

PI = NPOS / NNEG 

P2 = NPOS / N 

P3 = (10 * SUM(LR)) / (NPOS + NNEG) 
25 These P values are then utilized to determine if the gene is expressed. 

For purposes of illustration, the P values are broken down into ranges. If 
PI is greater than or equal to 2. 1, then A is true. If PI is less than 2, 1 and greater than 
or equal to 1.8. then B is true. Otherwise, C is true. Thus. PI is broken down into 
three ranges A, B and C. This is done to aid the readers understanding of the invention. 
30 Thus, all of the P values are broken down into ranges according to the 

following: 
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A = (P1>=2.1) 

B = (2.1 > PI >= 1.8) 

C = (PI < 1.8) 

X = (P2 > = 0.35) 

Y = (0.35 > P2 > = 0.20) 

Z = (P2< 0.20) 

Q = (P3 >= 1.5) 

R = (1.5 > P3 >= 1.1) 

S = (P3<1.1) 

Once the P values are broken down into ranges according to the above boolean values 
the gene expression is determined. 

The gene expression is indicated as present (expressed), marginal or 
absent (not expressed). The gene is indicated as expressed if the following expression is 

true: A and (X or Y) and (Q or R). In other words, the gene is indicated as expressed 
ifPl >=2.1.P2> = 0.20andP3>»l.l. Additionally.thegeneisindicatedas 
•^pressed If the following expression is true: BandXandQ. 

With the forgoing explanation, the following is a summaiy of the gene 
expression indications: 

Present A and (X or Y) and (Q or R) 

B and X and I 

Marginal A and X and S 

BandXandR 
Band Y and (Q or R) 

Absent AH others cases (f.^., any C combination) 

In the output to the user, present may be indicated as "P." maiginal as "M" and absent 
as "A" at stq> 274. 
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Once all the pairs of probes have been processed and the expression of the 
gene indicated, an average of ten times the LRs is computed at step 275. Additionally « 
an average of the IDIF values for the probes that incremented NPOS and NNEG is 
calculated. These values may be utilized for quantitative comparisons of this 
5 experiments with other experiments. 

Quantitative measurements may be performed at step 276. For example, 
the current experiment may be compared to a previous experiment (e.^., utilizing values 
calculated at step 270). Additionally, the experiment may be compared to hybridization 
intensities of RNA (such as from bacteria) present in the biological sample in a known 
10 quantity. In this manner, one may verify the correctness of the gene expression 

indication or call, modify threshold values, or perform any number of modifications of 
the preceding. 

For simplicity. Fig. 9 was described in reference to a single gene. 
However, the process may be utilized on multiple genes in a biological sample. 
IS Therefore, any discussion of the analysis of a single gene is not an indication that the 
process may not be extended to processing multiple genes. 

Figs. lOA and lOB show the flow of a process of determining the 
expression of a gene by comparing baseline scan data and experimental scan data. For 
example, the baseline scan data may be from a biological sample where it is known the . 
20 gene is expressed. Thus, this scan data may be compared to a different biological 

sample to determine if the gene is expressed. Additionally, it may be determined how 
the expression of a gene or genes changes over time in a biological organism. 

At step 302, the computer system receives raw scan data of N pairs of 
perfect match and mismatch probes from the baseline. The hybridization intensity of a 
25 perfect match probe from the baseline will be designed *'Ip„" and the hybridization 

intensity of a mismatch probe from the baseline will be designed "!„„„.*' The background 
signal intensity is subtracted from each of the hybridization intensities of the pairs of 
baseline scan data at step 304. 

At step 306, the computer system receives raw scan dau of N pairs of 
30 perfect match and mismatch probes from the experimental biological sample. The 

hybridization intensity of a perfect match probes from the experiment will be designed 



) 
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>- and the hybridizaUon intensity of a mismatch probe from the experimem will be 
designed -J^.- The background signal intensity is subtracted from each of the 
hybridization intensities of the pairs of experimental scan data at step 308. 

The hybridization intensities of an I and J pair may be normalized at step 
310. For example, the hybridization intensities of the I and J pairs may be divided by 
the hybridization intensity of control probes as discussed in Section II.A.2. 

Atstep312. the hybridization intensities of the I and J pair of ptobesaie 
compared to a difference threshold (DDIF) and a ratio threshold (RDIF) It is 
determined if the difference between the hybridization intensities of the one pair (J - 
O and the other pair - 1^ are greater than or equal to the difference threshoTd 
AND the quotient of the hybridization intensities of one pair (J^ - and the other 
pair (I^ - 1^ are greater than or equal to the ratio threshold. The difference thresholds 

are typically user defined values that have been determined to prtKluce accurate 
expression monitoring of a gene or genes. 

RDIF. the value NINC is incremented at step 314. In general. NINC is a value that 
indicates the experimental pair of probes indicates that the gene expression is likely 
greater (or increased) than the baseline sample. NINC is utilized in a determination of 
whether the expression of the gene is greater (or increased), less (or decreased) or did 
not change in the experimental sample compared to the baseline sample. 

At step 316, it is determined if (J^.j^)-(I^.I^ >= DDIFand(J 
- O / (U / O > = RDIF. If this expression is true. NDEC is incremented In 
general. NDEC is a value that indicates the experimental pair of probes indicates that the 
gene expression is likely less (or decreased) than the baseline sample. NDEC is utilized 
m a determination of whether the expression of the gene is greater (or increased), less 
(or decreased) or did not change in the experimental sample compared to the baseline 
sample. 

For each of the pairs that exhibits hybridization intensities either 
indicating the gene is expressed more or less in the experimental sample, the values 
NPOS. NNEG and LR are calculated for each pair of probes. TT.ese values are 
calculated as discussed above in reference to Fig. 9. A suffix of either "B" or "E" has 
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been added to each value in order to indicate if the value denotes the baseline sample or 
the experimental sample, respectively. If there are next pairs of hybridization intensities 
at step 322, they are processed in a similar manner as shown. 



Referring now to Fig. lOB, an absolute decision computation is performed 



for both the baseline and experimental samples at step 324. The absolute decision 
computation is an indication of whether the gene is expressed, marginal or absent in each 
of the baseline and experimental samples. Accordingly, in a preferred embodiment, this 
step entails performing sveps 272 and 274 from Fig. 9 for each of the samples. This 
being done, there is an indication of gene expression for each of the samples taken 
alone. 



At step 326, a decision matrix is utilized to determine the difference in 



gene expression between the two samples. This decision matrix utilizes the values, N, 
NPOSB, NPOSE, NNEGB, NNEGE, NINC, NDEC, LRB, and LRE as they were 
calculated above. The decision matrix performs different calculations depending on 
whether NINC is greater than or equal to NDEC. The calculations are as follows. 



If NINC > = NDEC, the following four P values are determined: 

PI = NINC / NDEC 
P2 = NINC / N 

P3 = ((NPOSE - NPOSB) - (NNEGE - NNEGB)) / N 
P4 = 10 * SUM(LRE - LRB) / N 



These P values are then utilized to determine the difference in gene expression between 
the two samples. 

For purposes of illustration, the P values are broken down into ranges as 
was done previously. Thus, all of the P values are broken down into ranges according 
to the following: 



C 



A 



B 



(PI > = 2.7) 

(2.7 > PI >= 1.8) 

(PI < 1.8) 



X 



(P2 > = 0.24) 
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y - (0.24 >P2>, 0.16) 
2 = (P2<0,160) 

M = (P3>=o.]7) 

N = (0.17 >P3>« 0 ,0^ 

O'(P3<0.10) 

Q = (P4>=,j) 

R = (1,3 > P4 >»Qpj 

S = fl>4 < 0.9) 
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increased 

20 " ('f or Y) and m nr . 

/ana(gorR)and(MorN) 
A and X and CO or R or «^ - J 

iVorKorS)and(MorNorO) 

Marginal a - « 

AorYorSorO 
Increase n . 

B«n<'(XorY)and(QorR)andO 
25 ^^<^°^Y)a„dSa„d(MorN) 

Cand(XorY,and(QorR)a„d(MorN) 

No Change a„ 

All others cases feo anv7,^ u- 

i«^ ir., any z combination) 

In the output to the user " 
» "- no . -NC. ; '^c"- - «^taa, «^ , 
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PI = NDEC / NINC 
P2 = NDEC / N 

P3 = ((NNEGE - NNEGB) - (NPOSE - NPOSB)) / N 
P4 = 10 • SUM(LRE - LRB) / N 

5 

These P values are then utilized to detennine the difference in gene expression between 
the two samples. 

The P values are broken down into the same ranges as for the other case 
where NINC > = NDEC. Thus, P values in this case indicate the same ranges and will 
10 not be repeated for the sake of brevity. However, the ranges generally indicate different 
changes in the gene expression between the two samples as shown below. 

In this case where NINC < NDEC, the gene expression change is 
indicated as decreased, marginal decrease or no change. The following is a summary of 
the gene expression indications: 



15 



20 



2S 



Decreased A and (X or Y) and (Q or R) and (M or N or O) 

A and (X or Y) and (Q or R or S) and (M or N) 
B and (X or Y) and (Q or R) and (M or N) 
A and X and (Q or R or S) and (M or N or O) 

Marginal A or Y or S or O 

Decrease B and (X or Y) and (Q or R) and O 

B and (X or Y) and S and (M or N) 
C and (X or Y) and (Q or R) and (M or N) 

No Change All others cases (e.g. , any Z combination) 



In the output to the user, decreased may be indicated as "D/ marginal decrease as 
"MD" and no change as "NC." 
30 The above has shown that the relative difference between the gene 

expression between a baseline sample and an experimental sample may be determined. 
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An additional test may be performed that would change an I. MI. D. or MD (/ e not 
NO call to NC if the gene is indicated as exp«ssed in both samples (e.^., from siep 
324) and the following expressions are all true: 



AverageGDIFB) > = 200 
AverageaOIFE) > » 200 

1 .4 > = Average(IDIFE) / Average(IDIFB) > = 0.7 

Thus, when a gene is expressed in both samples, a call of increased or decreased 
(Whether marginal or not) will be changed to a no change call if the average intensity 
difference for each sample is relatively large or substantially the same for both samples 
The IDIFB and IDIFE are calculated as the sum of all the IDIFs for each sample divided 
byN. 

At step 328. values for quantitative difference evaluation are calculated. 
An average of ((J^ - O - (I^ - 1.^) for each of the pairs is calculated. Additionally 

aquotientoftheaverageofJ^.j^andtheaverageofU-Uiscalculated. These ' 
values may be utilized to compare the results with other experiments in step 330. 

X. Mnnifnring gypr^ninn TifTfh 

As indicated above, the methods of this invention may be used to monitor 
expression levels of a gene in a wide variety of contexts. Forexample. where the effects 
of a drug on gene expression is to be determined the drug will be administered to an 
organism, a tissue sample, or a cell. Nucleic adds from the tissue sample, cell, or a 
biological sample from the organism and from an untreated organism tissue sample or 
cell are isolated as described above, hybridized to a high density p„>be array containing 
probes directed to the gene of interest and the expression levels of Urn gene are 
detMinined as described above. 

Similarly, where the expression levels of a disease marker {e.g., P53. 
RTK, or HER2) are to be detected (e.g., for tiie diagnosis of a pathological condition 
a patient), comparison of ti.e expression levels of the disease marker in Uie sample to 
disease markers from a healthy organism will reveal any deviations in die expression 



m 
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levels of the marker in the test sample as compared to the healthy sample. Correlation 
of such deviations with a pathological condition provides a diagnostic assay for that 
condition. 

5 EXAMPLES 

The following examples are offered to illustrate, but not to limit the 
present invention. 

Example 1. 

First Generatinn Qlignnugleotidg Arrays Designed f n Mgasiirg m RNA Levels fnr a 
10 Small Number of Murine Cytokines. 

A) Prenaratinn nf laheied RNA. 

1) From each of the preselected genes. 

Fourteen genes (IL-2, IL-3, IM, IL-6, 11-10, IL-12p40, GM-CSF, IFN- 
Y, TNF-o, CTLA8, fl-actin, GAPDH, IL-11 receptor, and Bio B) were each cloned into 

15 the p Bluescript II KS (+) phagemid (Stratagene, La JoUa, California, USA). The 

orientation of the insert was such that T3 RNA polymerase gave sense transcripts and T7 
polymerase gave antisense RNA, 

Labeled ribonucleotides in an in vitro transcription (IVT) reaction. Either 
biotin- or fluorescein-labeled UTP and CTP (1:3 labeled to unlabeled) plus unlabeled 

20 ATP and GTP were used for the reaction with 2500 units of T7 RNA polymerase 

(Epicentre Technologies, Madison, Wisconsin, USA). In vitro transcription was done 
with cut templates in a manner like that described by Melton et al , Nucleic Acids 
Research, 12: 7035-7056 (1984), A typical in vitro transcription reaction used 5 /ig 
DNA template, a buffer such as that included in Ambion's Maxiscript in vitro 

25 Transcription Kit (Ambion Inc., Huston, Texas, USA) and GTP (3 mM), ATP (1.5 
mM), and CTP and fluoresceinated UTP (3 mM total, UTP: Fl-UTP 3:1) or UTP and 
fluoresceinated CTP (2 mM total, CTP: Fl-CTP, 3: 1). Reactions done in the Ambion 
buffer had 20 mM DTT and RNase inhibitor. The reaction was run from 1 .5 to about 8 
hours. 

30 Following the reaction, unincorporated nucleotide triphosphates were 

removed using a size*selective membrane (microcon-100) or Pharmacia microspin S-200 
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column. The total molar concentrauon of RNA was based on a measurement of the 
absorbance at 260 nm. FoUowing quantitation of RNA amounts, RNA was fragmented 
randomly to an average length of approximately 50 - 100 bases by heating at 94''C in 40 
mM Tris-acetate pH 8.1, 100 mM potassium acetate. 30 mM magnesium acetate for 30 - 
40 minutes. FragmentaUon reduces possible interference from RNA secondary 
structure, and minimizes the effects of multiple interactions with closely spaced probe 
molecules. 

Labeled RNA was produced from one of two murine cell lines; TIO, a B 
cell plasmacytoma which was known not to express the genes (except IL- 10. actin and 
GAPDH) used as target genes in this study, and 2D6. an IL-12 growth dependent T cell 
line (Th, subtype) that is known to express most of the genes used as target genes in this 
study. Thus, RNA derived from the TIO cell line provided a good total RNA baseline 
mixture suitable for spiking mth known quantities of RNA from the particular target 
genes. In contrast, mRNA derived from the 2D6 cell line provided a good positive 
control providing typical endogenously transcribed amounts of the RNA from the target 
genes. 

i)TheTIQmiiriHi>Brpll i^nf 

The TIO cell line (B ceUs) was derived from the IL-6 dependent murine 
plasmacytoma line T1165 (Nordan et aL (1986) Science 233: 566-569) by selection in 
the presence of IL-1 1 . To prepare the directional cDNA library, total cellular RNA was 
isolated from TIO cells using RNAStat60 (Tel-Test B), and poly (A)* RNA was selected 
using the PolyAtract kit (Promega, Madison, Wisconsin, USA). First and second strand 
cDNA was synthesized according to Toole et c/., (1984) Nature. 312: 342-347, except 
that 5-methyldeoxycytidine 5'triphosphate (Pharmacia LKB, Piscataway. New Jersey. 
USA) was substituted for DCTP in both reactions. 

To determine cDNA frequencies TIO libraries were plated, and DNA was 
transfered to nitrocellulose filters and probed with '^P-labeled p-actin, GAPDH and 
IL-10 probes. Actin was represented at a frequency of 1:3000. GAPDH at 1; 1000, and 



wo 97/10365 PCT/US96/14839 

69 

IL-10 at 1:35,000. Labeled sense and antisense TIO RNA samples were synthesized 
from NotI and Sfil cut CDNA libraries in in vitro transcription reactions as described 
above. 



S in The ID6 murine hdper T celb line. 

The 2D6 cell line is a murine IL-12 dependent T cell line developed by 
Fujiwara et al. Cells were cultured in RPMI 1640 medium with 10% heat inactivated 
fetal calf serum (JRH Biosciences), 0.05 mM P-mercaptoethanol and recombinant 
murine IL-12 (100 units/mL» Genetics Institute, Cambridge, Massachusetts, USA). For 

10 cytokine induction; cells were preincubated overnight in IL-12 free medium and then 
resuspended (10^ cells/ml). After incubation for 0, 2, 6 and 24 hours in media 
containing 5 nM calcium ionophore A23187 (Sigma Chemical Co., St. Louis Missouri, 
USA) and 100 nM 4-phorboM2-myristate 13-acetate (Sigma), cells were collected by 
centrifugation and washed once with phosphate buffered saline prior to isolation of 

15 RNA. 

Labeled 2D6 mRNA was produced by directionally cloning the 2D6 
cDNA with aZipLox, Notl-Sall arms available from GibcoBRL in a manner similar to 
TIO. The linearized pZll library was transcribed with T7 to generate sense RNA as 
described above. 

20 

iii) RNA preparation. 

For material made directly from cellular RNA, cytoplasmic RNA was 
extracted from cells by the method of Favaloro e( ai, (1980) Meth. Enzym., 65: 
718-749, and poly (A)* RNA was isolated with an oligo dT selection step (PolyAtract, 

25 Promega, ). RNA was amplified using a modification of the procedure described by 
Eberwine et aL (1992) Proc. Natl. Acad. Sci. USA, 89: 3010-3014 (see also Van 
Odder et aL (1990) Science 87: 1663-1667). One microgram of poly (A)+ RNA was 
converted into double-stranded cDNA using a cDNA synthesis kit (Life Technologies) 
with an oligo dT prime incorporating a T7 RNA polymerase promoter site. After 

30 second strand synthesis, the reaction mixture was extracted with phenol/chloroform and 
the double-stranded DNA isolated using a membrane filtration step (Mircocon-100, 
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Amicon, Inc. Beverly, Massachusetts, USA). Labeled cRNA was made directly from 
the cDNA pool with an IVT step as described above. The total molar concentration of 
labeled CRNA was determined from the absorbance at 260 and assuming an average 
RNA size of 1000 ribonucleotides. RNA concentration was calculated using the 
conventional conversion that 1 OD is equivalent to 40 ^g of RNA. and that 1 of 
ceUular mRNA consists of 3 pmoles of RNA molecules. 

CeUular mRNA was also labeled directiy without any intermediate cDNA 
or RNA synthesis steps. Poly (A)- RNA was fragmented as described above, and the 5' 
ends of the fragments were Idnased and then incubated ovenight with a biotinylated 
oligoribonucleotide (5'-biotin-AAAAAA-3') in the presence of T4 RNA ligase 
(Epicentre Technologies). Alternatively. mRNA was labeled directiy by UV-induced 
crosslinking to a psoralen derivative linked to biotin (Schleicher & Schuell). 

B> Hiph l>Pnritv Arn^y ^r^r»*'»«jf 

A high density array of 20 mer Oligonucleotide iMX)bes was produced 
using VLSIPS technology. The high density array included tiie oligonucleotide probes 
as listed in Table 2. A central mismatch control probe was provided for each gene- 
specific probe resulting in a high density array containing over 16,000 different 
oligonucleotide probes. 
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Table 2. High density array design. For every probe there was also a mismatch control 
having a central 1 base mismatch. 

Probe Type Target Nucleic Acid Number of Probes 

Test Probes: IL-2 691 

5 IL-3 751 

IL-4 361 

IL.6 691 

IL-10 481 

IL-12p40 911 

10 GM-CSF 661 

IFN-Y 991 

TNF-a 641 

mCTLA8 391 

IL-11 receptor 1S8 

15 House Keeping Genes: GAPDH 388 

fi-actin 669 

Bacterial gene (sample Bio B 286 
preparation/amplification 

control) 



The high density array was synthesized on a planar glass slide. 

C) Array hyhridizjifinn and scanning. 

The RNA transcribed from cDNA was hybridized to the high density 

25 oligonucleotide probe array(s) at low stringency and then washed under more stringent 
conditions. The hybridization solutions contained 0.9 M NaCl, 60 mM NaH2P04, 6 
mM EDTA and 0,005 % Triton X-100 , adjusted to pH 7.6 (refened to as 6x SSPE-T). 
In addition, the solutions contained 0.5 mg/ml unlabeled, degraded herring sperm DNA 
(Sigma Chemical Co., St. Louis, Missouri, USA). Prior to hybridization, RNA samples 

30 were heated in the hybridization solution to 9 ''C for 10 minutes, placed on ice for 5 
minutes, and allowed to equilibrate at room temperature before being placed in the 
hybridization flow cell, Following hybridization, the solution was removed, the arrays 
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>vere washed with 6xSSPE-T at 2VC for 7 minutes, and then washed with 0.5x SSPE-T 
at 40-C for 15 minutes. When biotin-labeled RNA was used, the hybridized RNA was 
stained with a streptavidin-phycoerythrin conjugate (Molecular Probes. Inc.. Eugene. 
Oregon. USA) prior to reading. Hybridized arrays were stained with 2 Jm\ 
streptavidinphycoerythrin in 6xSSPE-T at 40°C for 5 minutes. 

The arrays were read using scanning confocal microscope (Molecular 
Dynamics. Sunnyvale. California. USA) modified for the purpose. TTie scanner uses an 
argon ion laser as the excitation source, and the emission was detected with a 

photomultiplier tube through either a 530 nm bandpass filter (fiuorescein) or a 560 nm 
longpass filter (phycoerythrin). 

Nucleic acids of either sense or antisense orientations were used in 
hybridization experiments. Arrays with for either orientation (reverse complements of 
each other) were made using the same set of photolithographic masks by reversing the 
Older of the photochemical steps and incorporating the complementary nucleotide. 

O) OimntitafiYf analysis of hvhridi«*inn ^.h,^ ^ tnf^nif fin, 

The quantitative analysis of the hybridization results involved counting the 
instances in which the perfect match probe (PM) was brighter than the corresponding 
mismatch probe (MM), averaging the differences (PM minus MM) for each probe 
family (/.e.. probe collection for each gene), and comparing the values to those obtained 
in a side-by-side experiment on an identically synthesized anay with an unspiked sample 
(if applicable). The advantage of the diffei«.ce method is that signals from random 
cross hybridization contribute equally, on average, to the PM and MM probes while 
specific hybridization contributes more to the PM probes. By averaging the pairwise 
differences, the real signals add constructively while the contributions from cross 
hybridization tend to cancel. 

The magnitude of the changes in the average of the difference (PM-MM) 
values was interpreted by comparison with the results of spiking experiments as well as 
the signal observed for the internal standard bacterial RNA spiked into each sample at a 
known amount. Analysis was performed using algorithms and software described 
herein. 
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P) Qptimizatinn of Probe Selection 

In order to optimize probe selection for each of the target genes, the high 
density array of oligonucleotide probes was hybridized with the mixture of labeled RNAs 
transcribed from each of the target genes. Fluorescence intensity at each location on the 
5 high density array was determined by scanning the high density array with a laser 

illuminated scanning confocal fluorescence microscope connected to a data acquisition 
system. 

Probes were then selected for further data analysis in a two-step 
procedure. First, in order to be counted, the difference in intensity between a probe and 

10 its corresponding mismatch pirobe had to exceed a threshold limit (SO counts, or about 
half background, in this case). This eliminated from consideration probes that did not 
hybridize well and probes for which the mismatch control hybridizes at an intensity 
comparable to the perfect match. 

The high density array was hybridized to a labeled RNA sample which, in 

IS principle, contains none of the sequences on the high density array. In this case, the 
oligonucleotide probes were chosen to be complementary to the sense RNA. Thus, an 
anti-sense RNA population should have been incapable of hybridizing to any of the 
probes on the array. Where either a probe or its mismatch showed a signal above a 
threshold value (100 counts above background) it was not included in subsequent 

20 analysis. 

Then, the signal for a particular gene was counted as the average 
difference (perfect match - mismatch control) for the selected probes for each gene. 

E) Results; The high density arrays provide specific and sensitlYC dfitcctinn of 
25 target niirleir ar ifls. 

As explained above, the initial arrays contained more than 16,000 probes 
that were complementary to 12 murine mRNAs - 9 cytokines, 1 cytokine receptor, 2 
constitutively expressed genes (S-acdn and glyceraldehyde 3-phosphate dehydrogenase) - 
1 rat cytokine and 1 bacterial gene (E, coli biotin synthetase, bioB) which serves as a 
30 quantitation reference. The initial experiments with these relatively simple arrays were 
designed to determine whether short in situ synthesized oligonucleotides can be made to 
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hybridize with sufficient sensitivity and specificity to quantitatively detect RNAs in a 
complex cellular RNA population. These arrays were intentionally highly redundant, 
containing hundreds of oligonucleotide probes per RNA, many more than necessary for 
the determination of expression levels. This was done to investigate the hybridization 
behavior of a large number of probes and develop general sequence rules for a priori 
selection of minimal probe sets for arrays covering substantially larger numbers of 
genes. 

The oligonucleotide arrays contained collections of pairs of probes for 
each of the RNAs being monitored. Each probe pair consisted of a 20-mer that was 
perfectly complementary (referred to as a perfect match, or PM probe) to a subsequence 
of a particular message, and a companion that was identical except for a single base 
difference in a central position. The mismatch (MM) probe of each pair served as an 
internal control for hybridization specificity. The analysis of PM/MM pairs allowed low 
intensity hybridization patterns ftom rare RNAs to be sensitively and accurately 
recognized in the presence of crosshybridization signals. 

For array hybridization experiments, labeled RNA target samples were 
prepared from individual clones, cloned CDNA libraries, or directly from cellular 
mRNA as described above. Target RNA for array hybridization was prepared by 
incorporating fluorescentiy labeled ribonucleotides in an in vitro transcription (IVT) 
reaction and then randomly fragmenting the RNA to an average size of 30 - 100 bases. 
Samples were hybridized to arrays in a self-contained flow cell (volume '200 ,iL) for 
times ranging from 30 minutes to 22 hours. Fluorescence imaging of the arrays was 
accomplished with a scanning confocal microscope (Molecular Dynamics). The entire 
array was read at a resolution of 11.25 (" 80-foW oversampling in each of the 100 x 
lOO urn synthesis regions) in less tiian 15 minutes, yielding a rapid and quantitative 
measure of each of the individual hybridization reactions. 

11 SnPTififitVof HvhrMiTat^dn 

In order to evaluate the specificity of hybridization, the high density array 
described above was hybridized with 50 pM of the RNA sense strand of IL-2, IL-3, IL- 
4, IL-6, Actin, GAPDH and Bio B or IL-10, IL-12p40, GM-CSF, IFN-y, TNF-a, 
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mCTLAS and Bio B, The hybridized array showed strong specific signals for each of 
the test target nucleic acids with minimal cross hybridization. 

1\ DetPctlnn nf C^nP Fjqu^inii Igygb in fl comnlex target igamnle. 
S To determine how well individual RNA targets could be detected in the 

presence of total mammalian cell message populations, spiking experiments were carried 
out. Known amounts of individual RNA targets were spiked into labeled RNA derived 
from a representative cDNA library made from the murine B cell line TIO. The TIO 
cell line was chosen because of the cytokines being monitored, only IL-10 is expressed 

10 at a detectable level. 

Because simply spiking the RNA mixture with the selected target genes 
and then immediately hybridizing might provide an artificially elevated reading relative 
to the rest of the mixture, the spiked sample was treated to a series of procedures to 
mitigate differences between the library RNA and the added RNA. Thus the "spike" 

15 was added to the sample which was then heated to 37''C and annealed. The sample was 
then frozen, thawed, boiled for 5 minutes, cooled on ice and allowed to return to room 
temperature before performing the hybridization. 

Figure 2 A shows the results of an experiment in which 13 target RNAS 
were spiked into the total RNA pool at a level of 1:3000 (equivalent to a few hundred 

20 copies per cell). RNA frequencies are given as the molar amount of an individual RNA 
per mole of total RNA. Figure 2B shows a small portion of the array (the boxed region 
of 2A) containing probes specific for interleukin-2 and interieukin-3 (IL-2 and IL-3,) 
RNA, and Figure 2C shows the same region in the absence of the spiked targets. The 
hybridization signals are specific as indicated by the comparison between the spiked and 

25 unspiked images, and perfect match (PM) hybridizations are well discriminated from 
missmatches (MM) as shown by the pattern of alternating brighter rows (corresponding 
to PM probes) and darker rows (corresponding to MM probes). The observed variation 
among the different perfect match hybridization signals was highly reproducible and 
reflects the sequence dependence of the hybridizations. In a few instances, the perfect 

30 match (PM) probe was not significantly brighter than its mismatch (MM) partner 

because of cross-hybridization with other members of the complex RNA population. 
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Because U« patterm an. highly repioducible and because detection does not depend on 

onlyasinglep.x,be per RNA. infrequent cross hybridization Of this type did not preclude 
sensitive and accurate detection of even low level RNAS. 

Similarly, infrequent poor hybridization due to. for example. RNA or 
Piobe secondary structure, the presence of polymorphism or database sequence errors 
does not preclude detection. An analysis of the observed patterns of hybridization and 
cross hybridization led to the formulation of general rules for the selection of 
ohgonucleotide p«,bes with the best sensitivity and specificity described herein. 

3) Rdatinnshfn Mywm T«rPff rftnrPntr.tinn u^ ^ yiA jr^ ^j ^^ 

A second set of spiking experiments was carried out to determine the 
nuige of concentrations over which hybridization signals could be used for direct 
quantitation of RNA levels. Figures shows the results of experiments in which the ten 
cytokine RNAs were spiked together into 0.05 mg/ml of labeled RNA fh,m the B cell 
(TIO) cDNA library at levels ranging from 1:300 to 1:300.000. A frequency of 

l:300.000isthatofanmRNApresentatlessthanafewcopiespercell. InlO^gof 
total RNA and a volume of 200 mL a frequency of 1:300.000 con«ponds to a 
concentration of approximately 0.5 picomolarand 0.1 femptomole C 6 x 10' molecules 
or about 30 picognuns)of specific RNA. 

Hybridizations were carried out in parallel at 40'C for 15 to 16 hours 

ThepresenceofeachofthelOcytokineRNAswasreproduciblydetectedabovethe ' 
background even at the lowest frequencies. Furthermore, the hybridization intensity was 

linearlyrelatedtoRNA target concentration between 1:300.000 and 1:3000 (Figure 3) 
Between ^ ^3000 and 1:300. the signals increased by a fector of 4 - 5 rather than 10 
because the probe sites were beginning to saturate at the higher concentrations in the 
com. of a 15 hour hybridization. The linear response range can be extended to higher 
concentrations by reducing the hybridization time. Short and long hybridizations can be 
combmed to quantitatively cover more than a I0*-fold range in RNA concentration. 

Kind spiking experiments were performed to test the ability to 
simultaneously detect and quantitate multiple related RNAs present at a wide range of 
concentrations in a complex RNA population. A set of four samples was prepared that 
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contained 0,05 mg/ml of sense RNA transcribed from the murine B cell CDNA library, 
plus combinations of the 10 cytokine RNAs each at a different concentration. Individual 
cytokine RNAs were spiked at one of the following levels: 0, 1:300,000, 1 :30,000. 
1:3000, or 1:300. The four samples plus an unspiked reference were hybridized to 
S sq>arate arrays for IS hours at 40*'C. The presence or absence of an RNA target was 
determined by the pattern of hybridization and how it differed from that of the unspiked 
reference/ and the concentrations were detected by the intensities. The concentrations of 
each of the ten cytokines in the four blind samples were correctly determined, with no 
false positives or false negatives. 

10 One case is especially noteworthy: IL-10 is expressed in the mouse B 

cells used to make the CDNA library, and was known to be present in the library at a 
frequency of 1:60,000 to 1:30,000. In one of the unknowns, an additional amount of 
IL-10 RNA (corresponding to a frequency of 1:300,000) was spiked into the sample. 
The amount of the spiked IL-10 RNA was correctly determined, even though it 

15 represented an increase of only 10 - 20% above the intrinsic level. These results 
indicate that subtle changes in expression are sensitively determined by performing 
side-by-side experiments with identically prepared samples on identically synthesized 
arrays. 

20 Example 2 

T Cell Indugtion Fypprimen f^i Mpnqiring Tytftkine mRNAs as a Fimrf ion of Time 

Vnllnwing Stimulation. 
The high density arrays of this invention were next used to monitor 
cytokine MRNA levels in murine T cells at different times following a biochemical 
25 stimulus. Cells from the murine T helper cell line (2D6) were treated with the phorbol 
ester 4-phorbol-12-myristate 13-acetate (PMA) and a calcium ionophore. Poly (A)* 
MRNA was then isolated at 0, 2, 6 and 24 hours after stimulation. Isolated mRNA 
(approximately 1 /ig) was converted to labeled antisense RNA using a procedure that 
combines a double-stranded cDNA synthesis step with a subsequent in vitro transcription 
30 reaction. This RNA synthesis and labeling procedure amplifies the entire mRNA 



wo 97/10365 

PCT/US96/14839 

78 

population by 20 to 50-fold in an apparenUy unbiased and reproducible fasI>ion (TabJe 
2). 

The labeled antisense T-cell RNA from the four time points was then 
hybridized to DNA probe arrays for 2 and 22 hours. A large increase in the y-interferon 
mRNA level was observed, along with significant changes in four other cytokine 
mRNAs aL-3. IL-IO. GM-CSF and TNF«). As shown in Figure 4. the cytokine 
messages were not induced with identical kinetics. Changes in cytokine mRNA levels of 
less than 1:130.000 were unambiguously detected along with the very large changes 
observed for y-interferon. 

These results highlight the value of the large experimental dynamic range 
inherent in the method. The quantitative assessment of RNA levels from the 
hybridization results is direct, with no additional control hybridizations, sample 
manipulation, amplification, cloning or sequencing. The method is also efficient. Using 
current protocols, instniraentation and analysis software, a single user with a single 
scanner can read and analyze as many as 30 arrays in a day. 

Example 3 

Figure 5 shows an array that contains over 65,000 different 
oligonucleotide probes (50 pm feature size) following hybridization with an entire 
murine B cell RNA population. Arrays of this complexity were read at a resolution of 
7.5 Urn in less than fifteen minutes. The array contains probes for 118 genes including 
12 murine genes represented on the simpler array described above. 35 U.S.C. §1020 
additional murine genes, three bacterial genes and one phage gene. There are 
JWroximately 300 probe pairs per gene, with the probes chosen using the selection rules 
described herein. Tl« probes were chosen fiiom the 600 bases of sequence at the 3 ' end 
of the translated region of each gene. A total of 21 murine RNAs were unambiguously 
detected in the B cell RNA population, at levels ranging from approximately 1:300.000 
to 1:100. 

Ubeled RNA samples from the T cell induction experiments (Fig. 4) 
were hybridized to these more complex 118-gene arrays, and similar results were 
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obtained for the set of genes in common to both chip types. Expression changes were 
unambiguously observed for more than 20 other genes in addition to those shown in 
Figure 4« 

To determine whether much smaller sets of probes per gene are sufficient 
S for reliable detection of RNAs, hybridization results from the 118 gene chip were 

analyzed using ten different subsets of 20 probe pairs per gene. That is to say, the data 
were analyzed as if the arrays contained only 20 probe pairs per gene. The ten subsets 
of 20 pairs were chosen from the proximately 300 probe pairs per gene on the arrays. 
The initial probe selection was made utilizing the probe selection and pruning algorithms 

10 described above. The ten subjects of 20 pairs were then randomly chosen from those 
probes that survived selection and pruning. Labeled RNAs were spiked into the murine 
B cell RNA population at levels of 1:23,000, 1 :SO,000 and 1 : 100,000. Changes in 
hybridization signals for the spiked RNAs were consistently detected at all three levels 
with the smaller probe sets. As expected, the hybridization intensities do not cluster as 

15 tightly as when averaging over large;* numbers of probes. This analysis indicates that 

sets of 20 probe pairs per gene are sufficient for the measurement of expression changes 
at low levels, but that improvements in probe selection and experimental procedures will 
are preferred to routinely detect RNAs at the very lowest levels with such small probe 
sets. Such improvements include, but are not limited to higher stringency hybridizations 

20 coupled with use of slightly longer oligonucleotide probes {e.g., 25 mer probes)) are in 
progress. 

Example 4 
Scalg Up to Thousands of Genes 
25 A set of four high density arrays each containing 25-mer oligonucleotide 

probes approximately 1650 different human genes provided probes to a total of 6620 genes. 
There were about 20 probes for each gene. The feature size on arrays was SO microns. 
This high density array was successfully hybridized to a cDNA library using essentially the 
protocols described above. Similar sets of high density arrays containing oligonucleotide 
30 probes to every known expressed sequence tag (EST) are in preparation. 
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Example 5 

"KraW here IS mlnnsicallypmlW and reriii, scalable „,K. . 

400,000probesinanareaof 1 6cm»f20v-5n . '-'™ng as many as 

unaancy. a set of four such arrays could cover the more than 40 nnn k 
for which there are «nr^„-w """"'^ ^0.000 human genes 

ESTs en he P"*"'^ ''-s. and new 

ESTs can be .ncoqxjmted as they become available. Because of th. k- • 

of the chemical synthesis amv, nf . combmatorial nature 

The quantitative monitoring of expression levels for laree numh., . 

«uciaaie the underlymg physiol<^y of the cell. 

Sample 6 
fl-Plffi Sriminn TTrinp ^ i^^,^, ^j^, 

hvhnH- • . ^ """^ "'^ '^'^ *° P'*^'" hybridization and cross 

hybnd«at.on intensities of a probe based on the sequence of bases in th T 
probe proneitie* Th*„- t . ^"^"'"^ °^ "''ses in the probe, or on other 

H properties. The neural net can then be used to oick an «r*,i, 

DrnhA. un. arbitrary number of the "best" 
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between predicted intensity and nieasured 
hybridization than hybridization. 

Input/output mapping. 
5 The neural net was trained to identify the hybridization properties of 20-mer 

probes. The 20-mer probes were mapped to an eighty bit long input vector, with the first 
four bits representing the base in the first position of the probe, the next four bits 
representing the base in the second position, etc. Thus, the four bases were encoded as 
follows: 

10 A: 1000 

C: 0100 
G: 0010 
T: 0001 

The neural network produced two outputs; hybridization intensity, and 
IS crosshybridization intensity. The output was scaled linearly so that 95% of the outputs firom 
the actual experiments fell in the range 0. to 1 . 

B) Neural net architecture. 

The neural net was a baclcpropagation network with 80 input neurons, one 
20 hidden layer of 20 neurons, and an output layer of two neurons. A sigmoid transfer 

function was used: ( s(x) = 1/(1+ exp(-l * x)) ) that scales the input values from 0 to 1 in a 
non-linear (sigmoid) manner. 

Q Neural nrt training. 

25 The network was trained using the default parameters from Nairal Works 

Professional 2.5 for a backprop network. (Neural Works Professional is a product of 
NeuralWare, Pittsburgh Pennsylvania, USA), The training set consisted of approximately 
8000 examples of probes, and the associated hybridization and crosshybridization 
intensities. 
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intensity, with a better model for cross 



30 



wo 97/10365 



PCT/US96/I4839 



10 



15 



20 



5 



82 



''^""'"''^*«8^'^-«P™vided in two matrices; an 81 X 20 matrix (T^^^^ 
(weights.l) and a 2 X 20 nuitrix Table 4 (weights_2). 

Tables. Neur al net weights (81 x 20 matrix) (weights_l). 



-0.0316746 

0.19370709 

0.02240546 

0.16692482 

0.02129388 

0.03684745 

0.00603615 

0.11111762 

001354388 

-0.0635492 

0.18790121 

0.02378313 

-0.0403537 

-0.0694051 



-0.0263491 
-0.0515666 
0.08460676 
-00913482 
012105247 
-0.0714359 
0.04986877 
0.12571541 
01131407 
-0.0227965 
0.09624594 
010295142 
0.23566079 
-0.0637478 



-0.0731941 

0.06500423 

0.03036973 

0.17097448 

0.05442215 

0.08683836 

0.08829379 

0.0749867 

0.05022619 

0.08078995 

0.03405219 

0.11517255 

0.14236264 

-0.0866363 

-0.0163019 

0.38030735 

0.23144296 

0.46158856 

0.45084599 

0.55080342 

0.36848074 

0.47133151 

0.46017882 



0.08858298 

0.11003297 

0.06836637 

-0.007098 

0.23686385 

014047802 

017881326 

0.08564588 

0.14544216 

-0.0022168 

0.06140256 

017431773 

0.17182963 

011008894 

0.06256609 

0.28241798 

-03207987 

0.20649959 

-0.5829023 

0.30968052 

-0.5196409 

0.30909833 

-05331213 



0.15907079 
0.06444275 
0.14313674 
0.05571244 
0.1405973 
0.02903421 
0.02134438 
0.09278143 
0.06123798 
0.1081195 
-0.0865264 
005553147 
0.10335726 
0.2687766= 

0.39719725 
0.0403917 
0.02345118 
-0.0348659 
0.01979881 
0.00982503 
0.12465772 
0.05334799 
0.03519877 
0.05439407 
001802093 
0.09664405 
0.02306779 
0.40543473= 



-00353881 

-0.0480836 

006798329 

0.22345543 

-0.0066357 

0.09420238 

00852259 

0.11373715 

0.14818664 

013419148 

-00126238 

-0.0193289 

0.07325625 



-00529314 

029237783 

0.06746746 

0.04707823 

-00760119 

012839544 

01 3453935 

0.03250757 

0.07090721 

008916269 

0 1149701 9 

-0.0627925 

011329328 



0.09014647 

-0.034054 

0.033717 

-0.0035547 

0.11165894 

008542864 

0.03089394 

-00460193 

005089445 

-0.010634 

-0.0057307 

-0.024633 

0.2555581 



-0.0709359 
0.02953459 
0.0206452 
0.09989586 
-9.80E-06 
0.11756061 
013134554 
0.14341639 
0.12799838 
-0.0789278 
0.0954654 
0.01782892 
-0.0489743 



014039235 

026901209 

-0.0079707 

007417496 

-00549301 

009054346 

009500015 

011468539 

001427337 

007312368 

0.00130152 

0.03840308 

-00006051 



0.23244983 

-0.0605089 

0.20967795 

-0.1236805 

0.08891765 

-0.028868 

0.04572553 

0.14277624 

0.16172577 

0.11417327 

-0.035995 

0.05180788 

0.19077648 



01 6058824 
0.2882407 
0.56366867 
0.35099933 
0.51297456 
0.54485208 
0.33829662 
0.37790757 
0.60684419 



0.14149499 

-0.2227429 

0.35976714 

-05071837 

0.33494622 

-0.7155912 

021612473 

-0464661 

0.47586009 



015698175 

0.34799534 

0.20325871 

056459975 

0.43086055 

0.30799151 

0.41646513 

050172138 

0.28597337 



-0.1197781 

0.38490915 

-0.343972 

0.21605791 

-0.5538613 

0.29871368 

-05573701 

0.21158406 

-0.3345993 
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0.33042327 

0.19591335 

0.19672842 

0.1710967 

0.03326527 

-0.0752053 

■0.0838031 

-0.2039919 

-0.0482095 

-0.0065265 

0.09420983 

0.00565713 

0.11275655 

-0.0850152 

0.1065109 

-0.0922655 

0.19862956 

-0.0721622 

0.20344114 

0.02848385 

-0.0742345 

-0.0747124 

-0.1090026 

0.07326921 

-0.0586419 

0.05150735 

-0.0001517 

0.12470152 

-0.049451 

0.18880892 

0.0234996 

0.16712189 

-0.0487184 

0.146753 

-0.0739603 

-0.1548294 

-0.1728483 

-0.0950026 

0.07691807 

-0.035349 

0.04664604 

0.00194441 

0.13777457 

-0.0514387 



0.4072904 

-0.4028497 

0.16133355 

-0.2728708 

0.22045346 

-0.0571054 
0.01667063 
-0.0532S26 
0.04316666 
-0.2011867 
-0.0010159 
-0.1990354 
0.01772332 
•J0.19310I2 
0.07205399 
-0.1478272 
-0.0502828 
-0.1506944 
-0.061502 

0.00254791 

-0.0545447 

0.13325705 

-0.0988943 

0.02654305 

-0.08015 

-0.1449667 

-0.0521925 

-0.3589714 

0.05717351 

-0.3259364 

-0.1177034 

-0.0122822 

0.01467591 

-0.0931665 

0.17018235 

-0.0908961 

0.12621336 

-0.1562225 

0.13016214 

-0.302975 

0.08887579 

-0.1631221 

0.00339417 

-0.0722146 



0.24270254 

0.30585453 

0.21780767 

0.1234024 

0.98782647= 

-0.1834571 

-0.0945634 

.0.0828366 

-0.1732933 

-0.0434558 

-0.1768979 

0.11568499 

-0.0016695 

0.08498721 

-0.1304159 

0.08858409 

-0.11447 

0.14910588 

-0.1647823= 

-0.0646306 

-01119258 

-0.0508435 

-0.0445145 

-0.1239398 

-0.0073617 

0.06144469 

0.21106339 

-0.0061972 

0.14784867 

0.04754021 

0.02549919 

-0.109654 

-0.0759871= 

-0.1475015 

-0.0636651 

-0.0415557 

-0.1321529 

-0.0917397 

0.10801306 

0.03706082 

-0.0210248 

0.11259725 

-0.2007502 

0.07706029 



-0.3750777 
0.35896543 
-0.2419563 
0.06987085 



0.14263187 

-0.1137057 

0.1373803 

0.0550463 

-0.0369132 

-0.2365085 

-0.0690084 

-0.24901 1 

0.03673514 

-0.1723315 

0.14206541 

-0.1440073 

0.03297219 



0.02634032 

0.10765317 

-0.1761459 

0.03802977 

0.03043288 

-0.1682889 

0.1005446 

-0.4393073 

0.07370338 

-0.3082401 

-0.0576587 

-0.1671077 

-0.0327367 



0.07284982 

0.04693379 

0.04915113 

-0.1091831 

0.18711324 

-0.3151104 

0.12322487 

-0.1427284 

-0.0984519 

-0.0703103 

0.04593663 



0.14083703 
0.24851802 
0.17847325 
0.1741322 



-0.0715346 

-0.1040308 

-0.0562212 

-0.0526818 

-0.0196296 

-0.0150508 

-0.1509431 

0.09066539 

-0.1446398 

0.09151162 

-0.0314846 

0.01366408 

-0.0266356 



-0.0654473 

-0.0606677 

-0.0883804 

-0.0484086 

0.09781751 

0.00400978 

0.22570252 

0.0053312 

0.25447422 

0.01207511 

0.02376083 

0.00582423 

0.01481733 



-0.0609536 

-0.2586751 

-0.0436857 

-0.0989133 

0.04599057 

0.0105284 

0.07198878 

0.09078772 

-0.0939511 

0.1548807 

-0.2334163 



0.30998308 
-0.2937264 
0.07593013 
0.05922241 



-0.0524248 

0.04263301 

-0.2127942 

0.06739104 

-0.1314755 

0.14120786 

-0.0575663 

0.05357879 

-0.199778 

0.05596334 

-0.1985286 

0.11101657 

-0.2501774 



0.04731949 

0.05693235 

-0.0777852 

-0.0337959 

0.02590732 

0.01282504 

-0.3763289 

0.13283829 

-0.3289591 

-0.1141143 

-0.2828108 

-0.0715723 

-0.0636454 



-0.0945313 

0.15550844 

-0.031472 

0.0294641 

-0.2039073 

0.10938062 

-0.2535323 

0.08646259 

-0.218395 

0.13540466 

-0.0250262 
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0.0994828 -0.035077 

-0.1571046 -0.1713289 

0.13411179 -0.0159559 

-0.0304715 -0.0845574 

0.13328855 -0.1492282 

0.0217719 -0.3102229 

0.04117605 0.03997391 

0.08965616 -0.1572192 

0.08670026 0.03785197 

0.00865917 -0.2995701 

-0.1322389 0.21433547 

0.1623435 -0.3362183 

-0.0887844 0.07691832 

0.08622501 -0.2421202 

-0.0290916 -0.0839412 

0.17453311 -0.1529943 

-0.158326 -0.0I49I14 



84 

-0.106266 -0.059766 

0.14155054 0.00283311 
-0.1296399= 

0.17682472 -0.0552084 

0.11350834 -0.1121938 

0.18922243 -0.0940011 

0.06022124 -0.1808036 

0.00942572 0.07957069 

0.21052985 -0.3564453 

-0.0835971 0.14536868 

0,08046963 -0.1548838 

-0.1335399 0.10284293 

0.11459036 -0.056257 

0.00845924 -0.0151014 

0.10590381 -0.1593935 

0.02726452 0.06178628 
-0.1479269= 



0.11429903 

0.33529782 

0.09062219 

0.49959001 

0.52447188 

0.78773916 

0.47296903 

0.80210346 

0.45408139 

0.56882453 

0.4490836 

0.39600489 

0.35209069 

0.13266218 

-0.0112394 

0.09505147 

0.16811867 

-0.0380792 

0.42699403 

-0.0381787 

0.74187845 

0.02410492 

0.7597335 

0.15992545 

0.25537059 

-0.1888455 

0.08058657 



-0.0432327 

0.24581231 

-0.2974442 

0.22195752 

-0.5555881 

0.45518181 

-0.672706 

0.40167108 

-0.7316507 

0.29653791 

-0.4754149 

0.24787127 

-0.203685 

0.20236486 

0.01601524 

-0.0220034 

-0.4498019 

-0.0468904 

-0.6348544 

0.09532065 

-0.8996705 

-0.0632124 

-0.6287012 

-0.1780757 

-0.4526066 

0.1974159 

-0.0768841 



0.14520219 

0.07311282 

0.46336258 

0.32254469 

0.68481833 

0.71273196 

0.69020337 

0.50383294 

0.48975253 

0.4472059 

0.46366793 

0.20359448 

0.25115264 

1.1078833= 

0.11363719 

0.0714381 

0.10313182 

0.37975076 

0.00025528 

0.50065184 

0.03180836 

0.73732454 

0.03615654 

0.3820785 

-0.0761788 

0.01620384 

-0.316401 



0.51860482 

-0.2268714 

0.17145836 

-0.4994924 

0.20251468 

-0.7655811 

0.37193877 

-0.6195157 

0.47984859 

-0.5177853 

0.31378582 

-0.203447 

0.21313109 



0.13616422 
0.01067419 



0.07044557 

0.02089526 

0.08787836 

0.04742034 

0.12980177 

0.01492627 

0.08446889 

-0.021533 

0.16658102 

0.01970494 

0.19088623 

-0.0399097 

0.06624542 



0.19151463 

0.3)717882 

0.32802406 

0.75497276 

0.39860719 

0.7155844 

0.47959387 

0.80366057 

0.33738744 

0.36228263 

0.48470935 

0.25734761 

0.12461348 



0.22308858 
-0.360891 



-0.1482136 

0.00104415 

-0.1835242 

-0.0744867 

-0.2440033 

0.04286519 

-0.1689682 

0.0558197 

-0.3004514 

0.08940192 

-0.1967196 

-0.0861852 

0.01004315 



-0.1127352 

0.35736522 

-0.3898261 

0.35112098 

-0.7198414 

0.39701831 

-0.9032337 

0.3884458 

-0.5510914 

0.40129057 

-0.2453159 

0.17168433 

0.10632347 



-0.1440069 0.05522444 -0.0711868 

-0.1994763 0.12304886 -0.1611445 

-0.0149997 0.47659361 -0.4639786 

-0.7120748 -0.1078557 0.10635795 

0.06202703 0.57867163 -0.6733171 

-0.7413587 -0.0193744 -0.1180785 

0.04010354 0.82366729 -0.6429569 

-0.8188882 0.04538922 -0.1471086 

-0.1248241 0.56647652 -0.6294683 

-0.5642462 -0.0609947 -0.0350918 

-0.0242514 0.35473567 -0.3512402 

-0.1306533 -0.1468564 0.25235301 

0.09779498 0.08537519 -0.0738487 
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-0.2839164 0.12684187 -0.2450078= 



10 



15 



-0.1147067 

-0.5139894 

0.19038832 

-0.9312411 

-0.5433689 

-0.9422795 

-0.68961SS 

-1.0231192 

-0.6568274 

-0.7811472 

-0.4704399 

-0.7735854 

-0.1544528 

-0.4815812 



-0.0084124 

-0.6221746 

0.55414283 

-0.410718 

0.92539561 

•0.6914638 

1.1251011 

-0.5556009 

1.1967098 

-0.5740913 

0.51728982 

-0.3031097 

0.2042688 

-0.5319371 



-0.5239977 

-0.3979228 

-1.1652025 

-0.1498093 

-0.9013531 

-0.7839714 

-0.8161536 

-0.7499282 

-1.150661 

-0.4527726 

-0.545236 

-0.4083092 

-0.8989772 

-1.3798244= 



-0.5021591 

0.30136263 

-0.3686967 

0.55332947 

-0.6145319 

1.4393494 

-0.8204682 

1.281976 

-0.5503616 

0.64911795 

-0.8311051 

-0.0152683 

-0.3088974 



0.02636886 

-0.742976 

-0.4750175 

-1.0870041 

-0.5512772 

-0.7092296 

-0.8957642 

-0.9347371 

-0.6640182 

-0.6970047 

-0.4240301 

-0.2330878 

-0.2014994 



0.1470097 

-0.4011821 

0.54713631 

-0.4378341 

1.0310978 

-0.894987 

1.3315079 

-0.6562014 

0.84698498 

-0.5759697 

0.37167478 

-0.5839304 

0.11505035 



20 



25 



30 



0.07143499 

0.1549352 

0.44703272 

0.2595928 

0,53066176 

0.1702383 

0.5403164 

-0.092208 

0.22238699 

-0.180493 

0.17421109 

-0.1982318 

0.05979542 

-0.1978694 



-0.1589592 

-0.0608833 

-0.6194252 

-0.119705 

-0,9705743 

0.02221953 

-0.5077381 

0.21902563 

-0.156256 

0.17164391 

-0.0730809 

0.06996673 

-0.0623277 

0.05119598 



0.04816094 

0.21059546 

0.19459446 

0.4913742 

0.1324198 

0.44412452 

0.00849557 

0.25788471 

-0.2092034 

0.15690604 

-0.3717274 

0.19735655 

-0.2521037 

-0.2067173= 



-0.0301291 

-0.4705076 

-0.0523894 

-0.8455008 

0.08982921 

-0.7700244 

0.1611405 

-0.3861519 

0.16458821 

-0.0254563 

0.1436436 

0.05625506 

0.0944353 



0.15144217 

0.16360784 

0.31194624 

0.15694356 

0.43900672 

0.10496679 

0.31764683 

-0.2022993 

0.20111787 

-0.1990184 

-0.0215865 

-0.241524 

-0.0492548 



-0.3037405 
-0.0684895 
-0.8030509 
•0.0023983 
-0.8588745 
0.14137991 
-0.5240273 
0.13711917 
-0.1418906 
0.10211211 
-0.2363243 
0.12768924 
0.05238663 



35 



40 



45 



0.06230025 

0.1073643 

-0.0272076 

0.33091745 

0.01069087 

0.11231339 

-0.0213237 

0.12980145 

0.15833771 

-0.0696303 

0.05538817 

•0.0677462 

0.14652038 

-0.1929855 



-0.0752745 

-0.090154 

-0.1014201 

-0.0610701 

0.02569587 

-0.0392407 

-0.0261696 

-0.038394 

0.01835199 

0.03802699 

0.01067943 

-0.0772208 

0.06084725 

0.00694158 



0.32974288 

-0.0938452 

0.19723812 

0.01335303 

0.11676744 

0.06117272 

0.09474246 

0.08167668 

0.04420554 

0.0806741 

0.04131892 

0.16641215 

-0.1150111 

0.26604816= 



0.00985043 

0.00704324 

-0.0935401 

0.02156818 

-0.0213131 

-0.0234323 

-0.0100756 

-0.0105376 

0.02605363 

0.03993953 

-0.0267609 

0.09142463 

-0.0687876 



0.07881941 

0.2569764 

0.0913924 

0.21619918 

0.1322203 

0.14693312 

0.10580003 

0.02142166 

0.27427858 

-0.0121658 

0.14418064 

0.02115551 

0.10878915 



•0.0835249 

0.08700065 

-0.0728388 

-0.0909865 

0.11848255 

0.13509636 

-0.0147534 

-0.0161705 

0.05774866 

0.07568218 

0.0897231 

•0.0876383 

0.32776353 
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-0.0786668 
-0.0029815 
-0.1259448 
-0.1091328 
0.00333312 
0.14768517 
0.0611263 
0.09951859 
0.05554885 
0.01806534 
0.10942505 
-0.0365961 
0.01934035 
-0.0525739 

0.19904579 
0.01703933 
0.39202845 
0.65535748 
0.95144385 
0.99852085 
1.2572207 
0.73526824 
0.95438999 
0.45917389 
0.33946255 
0.19083619 
0.30190364 
0.18584418 

0.13698889 
0.31540671 
0.00119518 
0.74023747 
0.06987014 
0.7840901 
0.05702339 
0.70519674 
0.11747536 
0.81293154 
0.24770954 
0.54755467 
0.03049339 
0.1008145 



0.05454836 
-0.0837616 
-0.0845026 
0.0090488 
•0.2812204 
0.02989549 
-0.1895157 
0.14843601 
-0.3743193 
0.09599103 
-0.0473638 
-0.0962418 
-0.0073082 
0.06086259 

-0.2001437 
0.06875326 
-0.6033413 
0.32430753 
-1.2075449 
0.48870567 
-1.5854638 
0.31977594 
-1.2543333 
0.27823627 
-0.5412283 
0.37056214 
-0.3655235 
0.34009755 

-0.0798945 
0.08274947 
-0.1978176 
0.38564634 
-0.5168169 
0.4372991 
-0.5161278 
0.15731441 
•0.612968 
0.18651071 
-0.4320194 
0.08819038 
-0.1913544 
0.01412579 



•0.0834711 
0.02468397 
0.10171869 
0.06142418 
0.02039073 
0.09454407 
0.08583955 
0.12351749 
-0.0205463 
-0.0570596 
0.01151769 
0.01007566 
-0.0489736 
-0.1788069= 

0.04977471 
0.09066898 
0.57940209 
0.64831889 
0.94851351 
1.7470727 
0.89351815 
1.2270083 
0.55854511 
026928344 
0.1085042 
0.24114503 
0.33355939 
4.5490937= 

0.3366704 
0.11212139 
0.59532708 
0.03748908 
1.0081589 
0.13783893 
0.66693234 
0.08724558 
0.98160452 
0.03182137 
0.72470272 
0.22105552 
0.4782092 
0.42727205= 



0.07707115 

0.03531792 

-0.0541042 

-0.167912 

-0.052828 

-0.1860176 

0.09382812 

-0.1327625 

0.12675567 

-0.1523381 

0.09737793 

-0.0049753 

0.10457312 



0.26628217 

-0.2003548 

-0.0460919 

-1.0950515 

-0.0852669 

-1.7586045 

0.39586932 

-1.2818555 

0.1672449 

-0.9804664 

0.44658452 

-0.3020035 

0.44246852 



0.17313539 
-0.428847 
-0.0309942 
-0.6475483 
-0.0517421 
-0.8574924 
-0.0496743 
-0.7325026 
0.02407174 
-0.7051651 
0.12951751 
-0.3489864 
-0.098419 



0.05659099 

-0.1437671 

0.05257236 

-0.098868 

-0.0439769 

-0.0505908 

-00001466 

0.10949049 

0.0775801 

0.08384241 

0.07082167 

0.01404589 

-0.0520154 



0.19910193 

0.26507998 

0.53419203 

0.80829531 

0.94320357 

0.56886804 

1.586942 

0.71813524 

0.56084049 

0.62299174 

0.39120093 

0.39015424 

0.17172456 



0.01228174 

0.57447821 

-0.0107875 

0.87958473 

0.08651814 

0.90612286 

0.07689167 

0.65517086 

0.02613025 

0.89682412 

0.14626819 

0.4620938 

-0.0160188 



-0.0285798 

0.10122854 

0.04065102 

0.02574896 

-0.0458286 

0.088718 

-0.4065202 

0.07129322 

-0.1869074 

0.00704122 

-0.2184597 

-0.0406134 

-0.0454775 



0.15184447 

0.062977] 

-0.7680888 

0.05049393 

-1.680338 

0.66196042 

-1.6365775 

0.37488377 

-0.7980669 

0.53984308 

-0.5676367 

0.09788869 

-0.3479928 



-0.2679709 

-0.0305296 

-0.7312108 

0.05327692 

-0.761238 

0.06334394 

-0.5775976 

0.29064488 

-0,677594 

0.181806 

-0.396433 1 

0.06516677 

0.07177288 



-0.0048454 0.1204864 0.15507312 0.25648347 
-0.0273505 0.10494121 0.1988914 0.09454013 



0.03982652 0.14641231 
-0.0560908 0.07466536 
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0.132S469 

-0.0007111 

•0.1188S3 

0.07947435 

-0.162177 

0.04106503 

-0.0012895 

0.13412228 

-0.142963 

0.19903891 

-0.0027455 

0.25000233 



0.15324508 

0.13285491 

0.26435438 

0.07329605 

0.18712705 

0.08498254 

0.2371086 

0.10756335 

0.09792294 

0.02989559 

0.16604523 

0.05931267 



-0.01398 

-0.1658676 

-0.0775707 

-0.0903666 

0.03216886 

-0.0325038 

0.14713244 

-0.0486093 

0.06907349 

0.15750381 

0.06245366 

0.2288 1882» 



87 

0.08281901 

0.25348473 

0.09143513 

0.10754076 

0.04698242 

0.29328787 

-0.053306 

0.05799349 

0.05942665 

-0.0373194 

-0.0775013 



0.07909692 

0.08835109 

-0.1019902 

0.04456592 

-0.0385783 

0.01249749 

-0.0808243 

0.21323961 

-0.143813 

0.12471988 

-0.0160873 



0.36858437 

0.16466415 

0.29236633 

0.18368921 

0.2276271 

0.10016124 

0.28909287 

-0.0118695 

0.21673524 

0.10462648 

0.21550164 



15 



20 



25 



30 



35 



40 



45 



0.04679342 

-0.1704439 

-0.215752 

-0.0430407 

-0.1322077 

0.10109599 

-0.0808031 

0.13912162 

-0.2270383 

0.01596376 

-0.1284984 

0.00538179 

-0.0861699 

0.20031671 

0.37838998 

-0.038453 

0.58336282 

0.07741276 

0.85640681 

-0.0642752 

0.79736245 

0.02592243 

0.88004726 

0.30492786 

0.54989374 

0.17151839 

0.51068121 

-0.1255455 

0.02952595 
-0.6262965 
0.02978903 



0.10158926 

0.302394 

0.32740423 

0.04886867 

0.2981362 

0.23081669 

0.15750171 

0.04256131 

0.22945035 

0.03504543 

0.24145114 

0.05302088 

0.05814215 

0.23140682 

0.00934576 
0.24550894 
-0.2145292 
0.45081589 
-0.6068144 
0.37914035 
-0.7102081 
0.37013471 
-0.6990998 
0.39735735 
-0.5660355 
0.39539635 
-0.3502096 
0.35898197 

-0.0751979 
-0.1423945 
0.20563391 



-0.122116 

-0.0671487 

-0.1597161 

-0.0914212 

0.1254565 

-0.1617257 

0.08072432 

-0.1625126 

0.18167619 

0.00964208 

0.20540115 

-0.1001294 

0.21307872 

0.16010799= 

-0.139213 

0.30729383 

-0.2378269 

0.65251595 

.0.1187844 

0.71409059 

0.14268413 

0.82774776 

0.23456772 

0.55497372 

0.1205707 

0.S0465S24 

-0.2094818 

0.79502285= 

-0,2556099 
-0.0537339 
-0.5457558 



0.23491009 
0.33251444 
0.18950906 
0.28192514 
0.15627012 
0.29508773 
0.12990661 
0.25232118 
0.00080986 
0.11757879 
0.07580803 
0.27505419 
0.01372274 



0.29823828 
-0.2807365 
0.25939462 
-0.4543131 
0.35959438 
-0.7180941 
0.41374633 
-0.8136597 
0.24596012 
-0.6593497 
0.22377795 
-0.3791285 
0.31471297 



-0.3040917 
0.11189342 
-0.3666513 



-0.0625733 
-0.0581705 
-0.1232446 
0.05275658 
0.04116358 
-0.0405337 
-0.1935954 
0.04736055 
-0.1253632 
-0.0230768 
-0.0932236 
0.22654785 
0.04515802 



0.40640026 
-0.0689575 
0.64761585 
-0.0671543 
0.71842372 
0.21169594 
0.75569016 
0.24068722 
0.67229778 
0.20656242 
0.46045718 
0.07184427 
0.18174268 



-0.0942183 
-0.3791296 
-0.1922515 



0.19985424 
0.21095584 
0.27883759 
0.21014904 
0.08507752 
-0.0497829 
0.29120663 
-0.0530935 
0.15695702 
0.04350457 
0.14288881 
0.02395938 
-0.0269269 



-0.067578 

0.26537073 

-0.3581158 

0.48592216 

-0.7140775 

0.27888221 

-0.7394939 

0.4508 1198 

-0.8148533 

0.3752968 

-0.519361 

0.36315975 

-0.1241962 



-0.0541431 
-0.3382006 
0.29512301 
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-0.7473708 ^.0415357 

0.00290797 0.6284017 

-1.0829539 -0.1822221 

0.06966544 0.75524592 

-0.8823278 -0.3404879 

0.0915129 0.44590429 

-0.499517 -0.4873153 

-0.1106236 0.27437851 

-0.6255118 -0.1046614 

-0.1468192 -0.1719856 

-0.213571 -0.1335077 

0.06424081 -0.0978306 

-0.1032737 0.11563963 

0.05533361 -0.033985 

0.05850215 0.03830531 

-0.012636 -0.1925185 

-0.0395793 0.03069885 

-0.0917266 -0.2185763 

0.23327024 -0.0898143 

0.10926479 -0.1167006 

0.12219627 0.05705986 

-0.1091286 -0.075133 

-0.0210903 0.11607172 

-0.1233738 -0.0760847 

0.06584878 -0.0323083 
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0.18283925 0.28153449 
-0.6397845 -0.5606785 
-0.1832336 0.49371469 
-0.9053063 -0.5826979 
-0.0334436 0.50130409 
-0.7808504 -0.4399623 
-0.2889721 0.47303999 

-0.6061368 -0.4166524 

-0.2710638 0.26425925 

-0.4140109 -0.1058299 
-0.7155944= 

-0.1169782 0.13909493 

-0.0709175 -0.028875 

-0.049436 0.11520655 

-0.0893732 -0.0066427 

0.13028348 -0.0045112 

0.07913893 -0.1470363 

0.04743406 -0.0364127 

-0.0578982 -0.2096201 

0.18223672 0.09710353 

-0.0505442 -0.1334345 

0.02949276 -0.0217044 

-0.0943146 -0.1014408 

0.00098273 0.07522969 
-0.0581293= : 
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-0.7847292 -0.2313099 

-0.1479581 0.57049137 

-0.6362705 -0.2790937 

-0.114608 0.90401584 

-0.57275 -0.3842527 

-0.1189605 0.59226018 

-0.4015501 -0.2875251 

-0.0637606 0.33875695 

-0.4123208 -0.2157291 

0.02873472 -0.1210428 



-0.0838893 -0.1300299 

-O.I 71 8288 -0.026291 

-0.0279296 -0.0170352 

0.06969514 0.13403182 

0.05260766 -0.2759708 

0.09080192 0.19741131 

0.00991712 -0 2093729 

0.09257686 0.00566842 

0.03838636 -0.2026017 

-0.0204458 0.01167099 

-0.0782921 -0.1160332 

0.02903902 0.02963065 

0.05794976 -0.1959872 



Table 4. Sec ond neural net weighting matrix (2x21) (weights 2). 

Vstllir, n -^,\lV^^. 0 20069ir7~r6T3Mi~~::il'om» "1^^ 

-0209??« V^litlnV -^^'28213 -1.0000007 -0.6456627 

-0 209518 1.6362301 -1.9999975 -0.2563241 0 04389827 1 7«»7«1 

2.0453076 0.08412334 -0.1645829= 004389827 1.7597554 

ofSf^SI 068506879 -1.1869608 0.39551663 0.38050765 0 40832204 

n oIi?S? -17462951 0.0818732 6.111361 0.62210494 0 4292T746 

0. 989 988 ut.0000067 -0.5605077 1.3601962 1.7318885 -1 0558798 

3.1242371 0.22860088 1.6726165= 1.0358798 



E) Codf for ninninp thy nff 

Code for running the neural net is provided below in Table 5 (neural_n.c) 
and Table 6 (Iin_alg.c). 
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Table S. Code for running the neural net (neural_n.c). 

#define local far 
#indude <windows.h> 
S ^include <aUoc. h> 
#include "utils.h" 
^include <string.h> 
#include <ctype.h> 
^include <stdio.h> 
10 #include <math.h> 
^include <mem.h> 
^include "des.util.h" 
#include "chipwin.h" 
#inc!ude "lin_alg.h" 



15 



void reportProblem( char local * message, short errorClass); 
char iniFileName[] = "designer.ini"; 



static void signioid( vector local * transformMe ){ 
20 short i; 

for( i = 0; i < transformMe->si2e; i-H- ) 

transforniMe->values[i] = 1/(1+ exp(.l * transformMe->values[i])); 

} 

25 static short getNumCo!s(char far * buffer){ 
short count =1; 
for( ;*buffer != 0; bufTerfH- ) 

if( ♦buffer = ^f)count++; 
return count; 

30 } 

static short getNumRows(char far * buffer){ 
char far * last, far * current; 
short count = -1; 
35 current - buffer; 

do{ 

count++; 
last = current; 

current - strchr( last+1, 0 ); 
40 }while( current > last+1 ); 

return count; 

) 

static void readMatrix( matrix local * theMat, char &r * buffer ){ 
45 short ij; 

char far * temp; 
temp buffer; 
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) ^ 

' 2«J»eMaxNumLines (20) 
#dcfineMaxLi„eS.a(l024) 

charfa.*,,ffer- --local Vei,i,.2 

"'"PialUngih; 

-.0),, ^ '^'■°"'^««-*»>-(M«N„^-.„.„^.,^. 

.10,), '*'"^'°""^"»»>-(M«N™ci,«.Ma^ 
40 ^ return FALSE; 

retumTRUE; 



20 



25 



30 



35 
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) 

short runFonvard( vector local ♦input, vector local ♦output, 

matrix local ♦weights!, matrix local 

♦weights2){ 

vector hiddenLayer; 

if( !aIlocateVector( AhiddenLayer, (short)(weightsl->numRows +1) )) return 
FALSE; 

if( I vectorTimesMatrix( input, &hiddenLayer, weights 1 ) ){ 
freeVectoit &hiddenLayer ); return FALSE; 

} 

sigmoid( &hiddenLayer ); 
hiddenLayer. values[ hiddenLayer. size - 1 ] = I ; 
if( !vectorTimesMatrix( &hiddenLayer, output, weights2 ) ){ 
freeVector( &hiddenLayer ); return FALSE; 

} 

freeVector( &hiddenLayer ); 
sigmoid( output ); 
return TRUE; 

} 

static vector input Vector= {NULL, 0}, output Vector = {NULL, 0}; static matrix 
firstWeights = {NULL, 0, 0} , secondWeights = {NULL, 0, 0); 

static short beenHereDoneThis - FALSE; 

static short makeSureNetIsSetUp( void ){ 

if( beenHereDoneThis ) return TRUE; 

if( !readNeuralNetWeights( &firstWeighls, &secondWeights )) return = FALSE; 
if{ !aIlocateVector( &input Vector, firstWeights. numCols )) return = FALSE; 
if( !allocateVector( &outputVector, secondWeights numRows )) return = FALSE; 

beenHereDoneThis = TRUE; 
return TRUE; 

} 

void removeNetFromMemory( void ) { 

freeVector( &input Vector ); freeVector( &outputVector ); 
freeMatrix( &firstWeights ); freeMatrix( AsecondWeights ); 
beenHereDoneThis = FALSE; 

) 

short nnEstimateHybAndXHyb( float local ♦ hyb, float local ♦ xHyb, char = local ♦ probe) { 
short probeLength, i; 

if( ImakeSureNetlsSetUpO) return FALSE; 
probeLength = (short)(strien( probe )); 



10 
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for(.-0;,<probeLength;i++) ^ ' 

'^yb = outputVector.valuL 1. ^ ^' 



J xHyb = outputVector.valuesrn 



20 ^I^Illf!^!^!^^^ 

Kn_alg,c ~ - ^ ^ ^ 

^include "utas.h" ~' 

*iiiclude "Jin alg.h" 
25 include <aJtec.h> 

short alJocateMatrix( matrix local * theMat ^.n 

short i; ^*'*'*°"">ws.shoocoluinns){ 

ior( 1 =^ 0; i < rows; i-H- ){ 

free( theMat->vaIuesril ) 
^ return FALSE; ' 

40 ^ 

.heMat^„^^.3 = rowM^^^^^ 

45 FALSE;) ^ ^ " ^^'^ aiiocate = vector-); 

theVec.>si2e = columns; 
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return TRUE; 

} 

void freeVector( vector local * theVec ){ 
free( theVec->values ); 
theVec->vaIues = NULL; 
theVec->size = 0; 

} 

void freeMatrix( matrix local * theMat){ 
short i; 

for( i = 0; i < theMat->nuniRows; i-H- ) 
free( theMat->vaIues[i] ); 
free( theMat->values ); 
theMat->values = NULL; 
theMat->numRows = theMat->numCols = 0; 

) 

float vDot( float local * input I, float local * tnput2, short size ){ 
float return Value = 0; 
short i; 

for( i = 0; i < size; i++) 

return Value input 1 [i] * input2[i]; 
return return Value; 

) 

short vectorTimesMatrix( vector local *input, vector local *output, 

matrix local *mat ){ . 

short i; 

lf( (input->size != mat->numCols) || (output->size < mat->nuniRows) ){ 
errorHwnd( "illegal multiply" ); 
return FALSE; 

} 

for( i = 0; i < mat->numRows; i++ ) 

output->values[i] = vDot( input->values, mat->values[i], input->size = 

return TRUE; 

) 



It is understood that the examples and embodiments described herein are 
for illustrative purposes only and that various modifications or changes in light thereof 
will be suggested to persons skilled in the art and are to be included within the spirit and 
purview of this application and scope of the appended claims. All publications, patents, 
and patent applications cited herein are hereby incorporated by reference for all 
purposes. 
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■--..:^r::rr;"~'"--~-"- 

(b) hybridizing said pool of nnc • • ^''"P'*' 
P-besi..o5iH.edo„aJace.srll;r"^^^ 

10 is localized^ . 

region of said surface; *° » predetennined 

each different oligonucleotide is attache! » . 
single covalent bond- '° surface thn,ugh a 

--re..,odifferen.o.igonucleotidep^«I^^^^ ..prises 

subsequences of that 

30 
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5. The method of claim 1, wherein said oligonucleotides are from 5 to 
45 nucleotides in length 

6. The method of claim 7, wherein said oligonucleotides are from 20 to 
S 25 nucleotides in length 

7. The method of claim 1, wherein said oligonucleotides are synthesized 
by light-directed polymer synthesis. 

10 8. The method of claim 1, wherein said array comprises oligonucleotide 

sequences from constitutively expressed control genes 

9. The method of claim 8, wherein said control genes are selected from 
the group consisting of 6-actin, GAPDH, and the transferrin receptor. 

10. The method of claim 1, wherein the variation between different copies 
of each array is less than 20% wherein said variation is measured as the coefficient of 
variation in hybridization intensity averaged over at least 5 oligonucleotide probes for 
each gene whose expression the array is to detect. 

1 1 . The method of claim 1 , wherein said pool of target nucleic acids is 
labeled with a single species of fluorophore. 

12. The method of claim 1, wherein preparation of said oligonucleotide 
25 probes does not require doning, a nucleic acid amplification step, or enzymatic synthesis. 

13. The method of claim 1, wherein preparation of said oligonucleotide 
probes does not require handling of any biological materials. 



15 



20 



30 



14. The method of claim 1 , wherein the concentration of nucleic acids in 
said pool is proportional to the expression levels of said genes. 



2]. The method of claim I . 

15 

23. The method of daim i . 

*^ of genes is loo 
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24. The method of claim 1, wherein said hybridization is performed with 
a fluid volume of about 250 fil or less. 

25. The method of claim 1, wherein said quantifying comprises detecting 
5 a hybridization signal that is proportional to the concentration of said RNA in said 

nucleic acid sample. 

26. The method of claim 1, wherein said quantifying comprises detecting 
a hybridization signal that is proportional to the concentration of said target nucleic acids 

10 for each gene in said pool of target nucleic acids. 

27. The method of claim 1, wherein said hybridization comprises a 
hybridization at low stringency of 30**C to SO^'C and 6 X SSPE-T or lower and a wash 
at higher stringency. 

15 

28. The method of claim 1, wherein said pool of nucleic acids is a pool 

of mRNAs. 

29. The method of claim 1, wherein said pool of nucleic acids is a pool 
20 of RNAs in vitro transcribed from a pool of cDNAs. 

30. The method of claim 1, wherein said pool of nucleic acids is 
amplified ftom a biological sample. 

25 31. The method of claim 1, wherein said pool of nucleic acids comprises 

fluorescently labeled nucleic acids. 

32. The method of claim 1, wherein said detecting comprises quantifying 
fluorescence of a label on said hybridized nucleic acids at a spatial resolution of 100 /im 
30 or higher. 
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nucleic acids; '^""^ ^ a pool of hybridized 

10 double stranded regions: "^"""'^^ '^^'-g intact the hybridized 

33 -n^ ™«hod Of Claim ., w<,«™ s„, p^i,,„^ 

UOB Where ad piired tarjtr ^fic oligonucleMides a« r^„,„, 
density anay; ™'«°"""eotide probes in said high 

[) ■ 

''*^*"*^**P°°' of nucleic acids with RN»c.u. 
hybridizedfdoublestrandedjnucleicacidsequences; "'^"^ ^ '"^^^ the 

(c) isolating the remaining nucleic acid sequences h,xnn , 

"i"" providing comprises- 

nd«e specfioUy «* pan,cula,preselc«ed n,RNA u,s=. ^ssages: 
(b) Dealing said pool of nucleic acids wiih RNast H ,„ h 
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(c) isolating or amplifying the remaining polyA* RNA in said 

pool. 

37. A composition indication the expression levels of a multiplicity of 
S genes, said composition comprising an array of oligonucleotide probes immobilized on a 
surface, said array comprising more than 100 different oligonucleotides wherein: 

each different oligonucleotide is localized in a predetermined 
region of said surface; 

each different oligonucleotide is attached to said surface through a 
10 single covalent bond; 

the density of said differ^t oligonucleotides is greater than about 
60 different oligonucleotides per 1 cm^ and 

said oligonucleotide probes are complementary to subsequences of 
said genes; and 

15 said oligonucleotide probes are specifically hybridized to one or 

more fluorescently labeled nucleic acids forming a fluorescent array such that the 
fluorescence of said array is indicative of the transcription levels of said multiplicity of 
genes. 

20 38. The composition of claim wherein said fluorescence intensity is 

proportional to the transcription levels of said multiplicity of preselected genes in a 
biological sample. 

39. The composition of claim wherein said array of oligonucleotides 
25 further comprises mismatch control probes. 

40. The method of claim 37, wherein each of said oligonucleotide probes 
is chemically synthesized. 



30 



41. The composition of claim 40, wherein said oligonucleotides are from 
S to 43 nucleotides in length. 
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42. The composition of claim 43. wherein said oligonucleotides are from 
20 to 25 nucleotides in length. 



43. The composition of claim41. 42. wherein said oligonucleotides 
synthesized by light-directed polymer synthesis. 



are 



44. The composition of claim 37. wherein said array further comprises 
expression control probes having sequences complementary to one or more constitutively 
repressed genes. ^ 



45. The composition of claim 44. wheidn said constitutively expressed 
genes are selected from the group consisting of B-actin. GAPDH. and ti,e transferrin 
Fecq)tor. 



46. The composition of claim 37. wherein said pool of nucleic acids is a 
poolofmRNAs. 



47. The composition of claim 46. wherein said RNAs are in vitro 
transcribed from a pool of cDNAs. 

48. A kit for tiie detection of expression levels of a multiplicity of genes, 
said kit comprising: 

an array of oligonucleotide probes immobilized on a surface, said array 
comprising more thaii 100 different oligonucleotides wherein: 

each different oligonucleotide is localized in a predetermined 
r^ion of said surface; 

each different oligonucleotide is attached to said surface tiirough a 
single covalent bond; 

tiie density of said different oligonucleotides is greater tiian about 
60 different oligonucleotides per 1 cm^; and 
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where, for each gene of said multiplicity of genes, said array 
includes at least one oligonucleotide probe complementary to a 
subsequence of said gene; and 

instructions describing the use of said array for the quantification of 
5 expression levels of said multiplicity of genes* 

49. The kit of claim 48, wherein said oligonucleotide probes range in 
length from S to 45 nucleotides. 

10 SO. The kit of claim 48, wherein said array further comprises mismatch 

control probes such that for each probe specific to a gene there exists a mismatch control 
probe. 

51. The kit of claim 48, further comprising fluorescent label for labeling 
15 RNA or DNA that is to be hybridized to the oligonucleotides of said array. 

52. The kit of claim 48, further comprising buffers and reagents for the 
hybridization of RNA to the oligonucleotide probes of said array. 

20 53. A method of selecting a set of oligonucleotide probes that specifically 

bind to one or more target nucleic acids, said method comprising: 

(a) providing a high density array of oligonucleotide probes said 
array comprising a muldplicity of oligonucleotide probes, wherein each probe is 
complementary to a subsequrace of said target nucleic acids and for each probe there is a 

25 corresponding mismatch control probe; 

(b) hybridizing said target nucleic acids to said array of 
oligonucleotide probes; and 

(c) selecting those probes where the difference in hybridization 
signal intensity between each probe and its mismatch control is detectable. 

30 

54. The method of claim 53, further comprising: 
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(c) hybridizing said array to a pool of nucleic acids comprising 
nucleic adds other than said target nucleic acids; and 

(d) selecting probes having the lowest hybridization signal and 
where both the probe and its mismatch control have a hybridization intensity equal to or 
less than 10 times background. 



55. The method of claim 53. wherein said oligonucleotide probes range 
in length from about 50 to about 45 nucleotides. 



56. The method of claim 53, wherein said oligonucleotide probes are all 
the same length. 



57. The method of claim 53, wherein said difference in hybridization 
intensity between each probe and its mismatch control is at least 10% of the background 
signal. 

58. The method of claim 53, wherein said multiplicity of probes includes 
all the probes of a single lengtii that are complementary to a subsequence of said target 
nucleic acid where said probes have a length between about 5 and 50 nucleotides. 

59. The method of claim 53, wherein said array comprises more than 100 
different oUgonucleotides wherein each different oligonucleotide is localized in a 
predetermined region of said surface and the density of said different oligonucleotides is 
greater than about eo'different oligonucleotides per 1 cm^ of said surface. 



60. The method of claim 53, wherein said target nucleic acid is a nucleic 
acid derived from a gene. 



61 . The method of claim 53, wherein said oligonucleotide probes 
syntiiesized by light-directed polymer synthesis. 
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62. The method of claim 53, wherein said mismatch control probes have 
a centrally located 1 base mismatch. 

63. The method of claim S3, wherein said hybridization comprises 

S hybridization at low stringency of SO^'C to SO^'C and 6 X SSPE-T or lower followed by 
one or more washes at progressively increasing stringency until a desired level of 
hybridization specificity is obtained. 

64. The method of claim 63, wherein pool of nucleic acids is a pool of 
10 nucleic acids having a sense opposite that of the nucleic acids to which said 

oligonucleotide probes are complementary. 

65. In a computer system, a method of monitoring expression of genes, 
the method comprising the steps of: 

15 receiving input of hybridization intensities for a plurality of nucleic acid 

probes including pairs of perfect match probes and mismatch probes, the hybridization 
intensities indicating hybridization affinity between the plurality of nucleic acid probes 
and nucleic acids corresponding to a gene, and each pair including a perfect match probe 
that is perfectly complementary to a portion of the nucleic acids and a mismatch probe 
20 that differs from the perfect match probe by at least one nucleotide; 

comparing the hybridization intensities of the perfect match and mismatch 
probes of each pair; and 

indicating expression of the gene according to results of the comparing 

step. 

25 

66. The method of claim 65, wherein the comparing step includes the 
step of calculating differences between the hybridization intensities of the perfect match 
and mismatch probes of each pair. 



30 



67. The method of claim 66, wherein the comparing step includes the 
stq) of calculating an average of the differences. 
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68. The method of claim 65, wherein the comparing step includes the 
step of detemining if a difference between the perfect match and mismatch probes of 
each pair crosses a difference threshold. 



69. The method of claim 66, wherein the comparing step includes the 
step of determining if a quotient of the perfect match and mismatch probes of each pair 
crosses a ratio threshold. 

70. The method of daim 69, wherein the comparing step includes the 
step of determining a first number of pairs that have a difference that crosses the 
difference threshold and a quotient that crosses the ratio threshold . 

7 1 . The method of claim 70, wherein the comparing step includes the 
step of determining a second number of pairs that have a difference that does not cross 
the difference threshold and a quotient diat does not cross the ratio threshold. 

72. The method of claim 71, wherein the indicating step indicates the 
gene is expressed if a quotient of the first and the second numbers crosses an expression 
threshold. 

73. The method of claim 65, wherein the plurality of nucleic acid probes 
are attached to a surfiice of a chip, the plurality of nucleic acid probes having a density 
greater than about 60 diffionent nucleic acid probes per 1 cm^ 

74. In a computer system, a method of selecting probes for monitoring 
expression of genes, comprising the st^s of: 

receiving input of a nucleic acid sequence constituting a gene; 
generating a set of probes that are perfecUy complementary to the gene; 

and 

identifying a subset of probes, including less than all of the probes in the 
set, for monitoring the expression of the gene. 
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75. The method of claim 74, wherein the identifying step includes the 
step of analyzing each probe of the set by criteria that specify characteristics indicative 
of low hybridization or high cross hybridization. 

76. The method of claim 75, wherein each of the criteria includes a 
threshold value such that if a selected probe has a characteristic that crosses the threshold 
value, low hybridization or high cross hybridization are indicated for the selected probe. 

77. The method of claim 76, further comprising the step of increasing at 
least one threshold value to increase tiie probes in the subset. 

78. The method of claim 75, wherein the identifying step is performed 
by a neural network that receives as input the probes of the set and outputs the probes of 
the subset. 

79. The method of claim 75, further comprising the step of determining 
the criteria as heuristic rules derived from multiple experiments. 

80. The method of claim 75, wherein one of the criteria indicates low 
hybridization or cross hybridization if occurrences of a specific nucleotide in a probe 
crosses a threshold value. 

81. The method of claim 75, wherein one of the criteria indicates low 
hybridization or cross hybridization if a number of a specific nucleotide that repeats 
sequentially in a probe crosses a threshold value. 



82. The meOiod of claim 75, wherein one of the criteria indicates low 
hybridization or cross hybridization if a lengdi of a palindrome in a probe crosses a 
threshold value. 
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83. Themethodofclaim75. whereinoneofthecriteriaindicateslow 
hybndization or cross hybridization if a length of a sub^uence within a probe that 
includes only two specific nucleotides crosses a threshold value. 
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